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VISUAL SENSITIVITY TO DIFFERENCES 
IN VELOCITY 


ROBERT H. BROWN! 
United States Naval Research Laboratory 


In various reviews of the literature, 
psychologists have stressed the de- 
pendence of the perception of motion 
upon a multitude of factors. Ken- 
nedy (1936), for example, indicated 
in his review that this dependence 
necessitates rigorous control in the 
experimental method used for meas- 
uring thresholds. The need for careful 
analysis and experimentation, also 
stressed earlier by Neff (1936), has 
been restated more recently by Gra- 
ham (1951) and Gibson (1958). De- 
spite the caution suggested by these 
reviews, analysis of data available in 
the literature for a specific threshold 
proves fruitful for application to a 
more general form of behavior. The 
purpose of the present paper is to 
discuss this analysis. 

‘‘isual sensitivity to differences in 
velocity is commonly measured by 
presenting two objects which move 
at slightly different, but constant, 
speeds. The least detectable dif- 
ference in speed is the differential 
threshold for the magnitude of veloc- 
ity. As an initial step in the paper, 
consideration of angular speed indi- 
cates that it is the basic unit of meas- 

1 This review of research on the visual per- 
ception of movement and its application to 
tracking and other predictive behavior has 
been improved by the suggestions of col- 
leagues, especially Joseph Dougherty, Robert 
E. Gardner, Howard Gordon, Jr., and Frank- 


lin V. Taylor of the United States Naval Re- 
search Laboratory. 


urement involved in studies of the 
differential threshold. Plotting dif- 
ferential thresholds for angular speed 
yields a meaningful relation to a 
primary variable, the speed of object 
motion. From these thresholds, the 
sensitivity is readily calculated and 
expressed in terms of the ratio of the 
threshold to the speed. As a final 
step in the paper, this Weber ratio 
for velocity is applied to tracking and 
other predictive behavior. 


DIFFERENTIAL SPEED THRESHOLDS 
Augular Speed 


Graham (1951) has described the 
concept of visual angle and the util- 
ity of specifying stimulus extents in 
terms of the angle they subtend at 
the eye. Similarly, the visual angle 
per unit time or angular speed is a 
basic variable in experiments con- 
cerned with the visual perception of 
movement. Its use facilitates the 
comparison of data obtained under 
different conditions. For example, 
threshold measurements made in in- 
dependent experiments at varying 
observational distances are expressed 
in terms of a common measure, angu- 
lar speed. In addition, the use of 
angular speed as a stimulus specifica- 
tion may be necessary for good ex- 
perimental design. 

In Figure 1, the axis of rotation at 
O may be specified in terms of a 
convenient reference point such as 
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Fic. 1. Diagram representing components in 
the measurement of angular speed. 


the front surface of the cornea. The 
radius of rotation (7) is given by the 
distance from the reference point 
to the appropriate moving object. 
When the eye looks steadily at fixa- 
tion point A, the line of regard OA is 
stationary. Alternatively, one may 
assume a rotating line of regard in 
experiments involving fixation on a 
moving object. Presently available 
data do not indicate unequivocally 
that the alternative assumptions 
yield a measurable difference in the 
perception of velocity. Fleischl 
(1882) reported that an object seen 
while fixating a stationary point 
moves subjectively faster than when 
followed by the eyes. Since Aubert 
(1886) confirmed the phenomenon, 
it has been called the Aubert-Fleischl 
paradox. However, the need for re- 
examination of the paradox is in- 
dicated by the recent work of Gibson, 
Smith, Steinschneider, and Johnson 
(1957). When they measured the ac- 
curacy of visual perception of motion, 
they found no difference for the two 
modes of observation. 

As a stimulus rotates about the 
reference point in Figure 1, its in- 
stantaneous angular speed (w) is 
given by: 


B' where @ is the angle swept by the 


radius vector 7 in time ¢. 
of @ is given by: 


The value 


[2) 


(in radians) 


57.35 
@= 


(in degrees) [3 
The measure angular speed may be 
used advantageously not only for 
rotational motion but also for tan- 
gential motion. In Figure 1, the 
rectilinear distance d is a close ap- 
proximation to the arc s for angular 
displacements of the magnitude usu- 
ally used. For example, d exceeds 
s by only 1% for a 6 of 10°. Con- 
versely, angular displacements less 
than 10° may be calculated with 
less than 1% error by substituting d 
for sin Equation 3. For greater dis- 
placements, @ is calculated from: 


[4] 


For uniform angular motion when 
w is constant: 


@=arc tan — 
rT 


[S] 


Although this equation is a special 
case of the earlier definition of in- 
stantaneous angular speed in deriva- 
tive form, it applies with very few 
exceptions to experiments which have 
been conducted on the perception of 
movement. By substitution for 6 
from Equation 2, uniform angular 
speed may be described by: 


[6] 


where the arc s and the radius r are 
expressed in the same units. As an 
approximation for small angular dis- 


s 
w=— (in radians per unit time) 


rt 
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(a) SEPARATE 


(b) ADJACENT 


(c) SUPERIMPOSED 


Fic. 2. Procedures used in presentations of stimulus motion. 


placements, we may substitute d for 
s to obtain: 
d 
gras 
rt r 


(in radians per unit time) [7] 


57.30 


a= 
r 


(in degrees per unit time) [8) 


where the uniform linear speed v and 
the observational distance r are ex- 
pressed in consistent units. 


The Differential Speed Threshold and 
Its Measurement 


The differential threshold for angu- 
lar speed, Aw, may be defined in 
terms of: 

Aw =w2—-} [9] 


where w, is a uniform angular speed 
an observer discriminates according 
to a specified criterion from the con- 
stant rate of motion w;. In measure- 
ments of Aw, the spatial relationship 


of w; and w: is critical. Three proce- 
dures used to date involve stimuli 
which are separate, adjacent, and 
superimposed. In Figure 2, a circle 
represents schematically an outline 
of a display, such as moving belt, 
rotating disc, or cathode-ray tube, 
used in presenting w; and w:. The 
speeds are represented by the vectors 
in each display. In Procedure a, the 
stimuli for the two speeds are 
spatially apart and are viewed by 
looking from one display to the other. 
In Procedure 6, the stimuli are in 
immediate proximity. In Procedure 
c, they are superimposed on each 
other. Table 1 summarizes the most 
significant stimulus conditions pres- 
ent in measurements of Aw. 

At least six experiments have been 
reported for measurements involving 
separate stimuli. Bourdon (1902) 
utilized two rotating white discs with 
a black rectangle on the edge of each. 
The subject adjusted the speed of one 
in increments until it was noticeably 
faster than the other. Similar meas- 
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TABLE 1 





StTrmuLus CONDITIONS PRESENT IN MEASUREMENTS OF Aw 





Spatial 
Relation 
of a and w: 


Stimulus 


Experimenters Frequency 





Bourdon (1902) Separate Repetitive 


Stimulus Objects 


Observa- 
tional 
Distance 
(cm.) 


Field 
Extent 
(degrees) 


Direction of 
Motion 





Black rectangle on Circular 6.4 200 


edge of 2 white discs 


Brown (1931) Separate Repetitive 


Black square on 


Rectilinear up- 2.15-4.30 200 


white paper war 


Brown & Mize (1932) Separate Repetitive 


Black square on 


Rectilinear up- 2.15-4.30 200 


white paper war 


Zegers (1948) Superimposed Single 


Hick (1950) Adjacent Single 


Ekman & Dahlbick 


Separate 
(1956) 


Repetitive 


Gibson, Smith, Stein- 
schneider, & Johnson 
(1957) 

Notterman & Page 
(1957) 


Brandalise & Gotts- 
danker (1959) 


Separate Repetitive 


Adjacent Single 


Separate Repetitive 





urements were made by Brown (1931) 
and by Brown and Mize (1932) for a 
black square moving upward on 


white paper which the observer saw 


in either of two windows. Ekman 
and Dahlback (1956) and Gibson et 
al. (1957) have made measurements 
involving the adjustment of w. for 
apparent equality with w;. The 
former utilized two apertures in each 
of which alternately the observer 
saw the horizontal motion of black 
vertical lines on white paper. The 
latter presented behind two windows 
a downward moving wallpaper with 
a pattern of dots. Most recently, 
Brandalise and Gottsdanker (1959) 
have had subjects adjust the speed of 
rotation of a black disc with a white 
dot on its edge to apparent equality 
with that of another. In these six 
experiments, the measurements of 
Aw were based on comparisons of the 
two speeds which were viewed sep- 
arately in different places. Since the 
equipment involved rotating drums 
or discs, stimulation was repetitive. 


2 needles perpendic- 
ular to line of sight 


Spot on oscilloscope 
Black vertical lines 
on white paper 


Wallpaper with pat- 
tern of dots 


Spot on oscilloscope 
White dot on edge 


of 2 black discs 


Rectilinear to S’s 3.6 -15.0 


right 


15.9 


Rectilinear to S's 4.8 53.3 
left 


ef 


Rectilinear to S’s 
right or left 


Rectilinear 
downward 


Rectilinear hori- 
zontal 


Circular 


Use of a moving spot on an oscillo- 
scope has facilitated presentation of 
adjacent stimuli. During rectilinear 
motion of a pip at constant speed, an 
incremental change in speed is in- 
troduced. Hick (1950) and Notter- 
man and Page (1957) measured the 
differential threshold in speed for a 
pip as it was horizontally deflected 
across the face of a cathode-ray tube. 
Temporal features of this procedure 
differ from the first. The stimuli are 
presented only once and then in 
immediate succession. 

The procedure of superimposed 
stimuli may be illustrated by monoc- 
ular movement parallax. When two 
objects move at the same linear speed 
perpendicular to the subject’s line of 
sight, the difference in their angular 
speeds provides an indication of their 
distances from the subject. As the 
objects are brought closer together, 
the difference in angular speeds de- 
creases to a threshold value. Zegers 
(1948) has measured the differential 
threshold speed for two needles by 
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TABLE 2 


METHODOLOGY USED IN THE MEASUREMENTS OF Aw 





Experimenters 
Bourdon (1902) 
Brown (1931) 

Brown & Mize (1932) 
Zegers (1948) 

3.6° field 

15.0° field 
Hick (1950) 
Ekman & Dahlbick 
(1956) 
Gibson, Smith, Stein- 
schneider, & Johnson 
(1957) 


Notterman & Page 
(1957) 


Brandalise & Gotts- 


Psycho- 
physical 
Method 
Limits 
Limits 
Limits 
Constant 
stimuli 


Constant 
stimuli 


Constant 
stimuli 


Average 
error 


Average 
error 
Constant 
stimuli 


Average 
error 


Measure 
of Aw 


Mean 


Mean 
Mean 
Standard 
deviation 


Standard 
deviation 


Mean 

Standard 
deviation 
Standard 
deviation 


Mean 


Standard 
deviation 





No. of 
Measure 
ments per 
Speed per 
Subject 


No. of 
Subjects 


No. of 
Speeds 


20 
10 


3-6 


No. of 
Measure 


Total Speed (degrees per sec.) 


mente Minimum Maximum 


60 5.04 
40 : 3.58 
117 . 4.58 


800 20.1 


danker (1959) 


this procedure, which temporally in- 
volves the single presentation simul- 
taneously of w; and We. 

The psychophysical method used 
has been less critical for measure- 
ments of Aw than the spatial rela- 
tionship of the stimuli. Table 2 lists 
significant methodological charac- 
teristics for the nine experiments. 
Items specially worth noting are the 
limited range of speeds in most ex- 
periments and the small number of 
measurements in some studies. 


The Differential Threshold as a 
Function of Speed 

The marked effect of spatial order 
may be observed by inspection of 
Figure 3, in which Aw is plotted 
against w. The curves and their 
points represent the use of adjacent, 
separate, and superimposed stimuli. 
The measure of Aw is that indicated 
in Table 2. Since Brown (1931) and 
Brown and Mize (1932) made only 
a small number of exploratory:  ‘s- 


urements, the points plotted for their 
experiments are the geometric means 
of values they reported for speeds 
1-2, 2-3, 3-4, and 4—5° per second. 
The data plotted for superimposed 
stimuli represent the monocular 
movement parallax thresholds ob- 
tained by Zegers (1948) with the 
widest and narrowest visual fields of 
the four for which he made measure- 
ments. Otherwise, the points repre- 
sent all values reported in the litera- 
ture for Aw as listed in Table 2. 

The solid lines have been drawn 
with unit slope and represent a con- 
stant Weber fraction (Aw/w). In the 
case of adjacent stimuli, solution for 
the intercept constant by the method 
of least squares yields the plotted 
equation: 


log Aw = —0.859-+log w 


[10] 


It may be observed as a rough ap- 
proximation that the differential 
threshold increases in direct propor- 
tion to the angular speed of a stimu- 
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Fig. 3. The differential angular speed 
threshold (Aw) as a function of the angular 
speed (w) of stimuli which are presented 
spatially adjacent, separate, and superim- 
posed. 


lus. Discrepancies in this approxima- 
tion occur in a middle range of speeds 
(1-5° per second) where the meas- 
ured Aw falls below the empirical 
straight line. At faster speeds, Aw in- 
creases at an increasing rate with w. 
For separate stimuli, the least 
squares equation is as follows: 


log Aw = 1.114+log w 


[11] 


Under these conditions, the differen- 
tial threshold increases in direct pro- 
portion to speed from approximately 
1 to 10° per second. The differential 
threshold is greater at slow speeds, 
and less at fast speeds, than the best 
fitting straight line of unit slope. 
Data obtained for superimposed 
stimuli can be described by a con- 
stant Weber fraction only under 
quite restricted conditions. Thus, 
for the widest field (15°) a solid line 
is drawn between the points for the 
two slowest speeds. Its equation is: 


log Aw = — 2.893+log w 


[12 


The rapid increase in the differential 
threshold with speed for superim- 
posed stimuli may be interpreted in 
terms of instability of the retinal 
image and intensity effects in indivi- 
dual cones. 

As Zegers (1948) indicates in dis- 
cussion of his results, high speeds in- 
terfere with good “pickup” of the 
stimuli as they appear in the visual 
field and, also, with adequate follow- 
ing movements of the eyes. The in- 
fluence of extent of visual field, so 
marked in Figure 3, was markedly 
decreased, if not eliminated, by pro- 
viding appropriate aids during con- 
trol experiments to fixation and 
stimulus “‘following.’’ Careful meas- 
urement of the vertical distance be- 
tween the curves for the 15 and 3.6° 
fields indicates that they could very 
nearly be superimposed by a shift of 
0.905 log unit, the mean of separa- 
tions of 0.853, 0.947, 0.909, and 
0.909 log unit. We may infer that the 
vertical position of the curves de- 
pends primarily on stability of the 
retinal image. When stimulus condi- 
tions for good fixation of the stimulus 
are absent, the differential threshold 
function of Figure 3 is shifted uni- 
formly upward with decrease in ex- 
tent of the visual field. 

The shape of the curves for super- 
imposed stimuli appears to be de- 
pendent upon the intensity effects oc- 
curring in individual cones. Evidence 
for this inference is less direct than 
Zegers’ control experiments involv- 
ing improved conditions for fixation 
and pursuit of the stimulus. How- 
ever, it should be pointed out that 
Graham, Baker, Hecht, and Lloyd 
(1948) measured the differential 
threshold as a function of the lumi- 
nance of the stimulus field. Neutral 
tint filters were placed behind the 
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metal tube through which the ob- 
server saw two needles, one above 
the other, moving at constant and 
equal speeds back and forth across 
an illuminated field. Measurements 
of the precision of distance settings 
with one needle yielded differential 
angular speeds for different lumi- 
nances of the visual field. The de- 
crease in Aw as a function of the in- 
crease in luminance is described by 
Hecht’s intensity discrimination 
equation upon the assumption that 
Aw is a measure of differences in dif- 
fraction luminances and provides a 
AI seen against the general illumina- 
tion, J. 

In addition, measurements of the 
threshold luminance for a moving 
spot of light indicate that the in- 
tensity effect of speed is similar to the 
parallax effect of Figure 3. At mod- 
erate speeds, the threshold luminance 
for discrimination of motion increases 
in direct proportion to the speed 
(Brown, 1958). At faster speeds 
(greater than 10° per second), the 
luminance threshold increases at a 
disproportionate rate until it ap- 
proximates an asymptote at a limit- 
ing speed (30 to 40° per second). 
This relationship, like that found by 
Zegers, may be interpreted in terms 
of intensity effects occurring in in- 
dividual cones. As angular speed in- 
creases, the duration of passage of the 
image across a given cone is shortened. 
Since the intensity effect in each 
receptor unit is lessened, the lumi- 
nance for the moving spot or the 
differential angular speed of the 
needles must be increased. 


THE WEBER RATIO 


The Weber ratio provides a con- 
venient measure by means of which 
velocity discriminations may be com- 
pared with other sensory discrimina- 
tions and with performance in track- 


ing and predicting. The ratio of the 
differential threshold (Aw) to the 
magnitude of the standard (w:) may 
be readily calculated from Equations 
10-12 for adjacent, separate, and 
superimposed stimuli. The best es- 
timate of Aw/w for an unspecified w 
is 0.138 for adjacent stimuli and 
0.0769 for separate stimuli. This dif- 
ference has been confirmed by Not- 
terman (1959) in measyrements made 
by an oscilloscope with both prece- 
dures. Since his experiment excludes 
variations in stimulus conditions 
other than the spatial order, Notter- 
man’s interpretation of the difference 
is of particular interest: 

Subjects in the adjacent presentation case 
can base their discrimination on a comparison 
of the amount of time taken to traverse the 
initial and final 14 inches on the scope face, 
or—and this is important—they can disregard 
time and look for the jerk which occurs when 
the moving spot instantaneously increases its 
velocity. The subjects employing the sepa- 
rate presentation procedure do not have this 
option: since the standard and comparison 
stimuli are separated in time, there is no jerk. 
In short, the subjects of the (adjacent) pro- 
cedure may have changed the problem from 
one requiring a comparison of two velocities, 
to one requiring a judgment of the presence 
or absence of jerk (p. 3). 


The marked superiority of super- 
imposed stimuli in yielding a low 
Weber fraction is illustrated by the 


value of 0.00128 for two needles 
traversing an extent of 15° at angular 
speeds less than 5° per second. This 
superiority is readily understandable. 
Superimposition of one needle in 
front of the other provides an angular 
offset which Zegers has found to bea 
basic determiner of the differential 
angular speed threshold. The angular 
offset is absent when stimuli are pre- 
sented adjacently in immediate suc- 
cession or separately in space and 
time. 

Variation of the Weber fraction 
over the whole speed range is plotted 
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Fic. 4. The Weber ratio (Aw/w) as a func- 
tion of the angular speed (w) for discrimina- 
tions utilizing adjacent, separate, and super- 
imposed stimuli. 


in Figure 4. The points represent 
geometric means of values deter- 
mined by different investigators at 
approximately the same angular 


speeds. Thus, the top curve for ad- 


jacent stimuli is the average of values 
obtained by Hick (1950) and Notter- 
man and Page (1957). Except for the 
point at the slowest speed (Hick) 
and at the fastest speed (Notterman 
and Page), each point is the geo- 
metric mean of the Weber fraction 
in both studies. A similar procedure 
has been followed in averaging meas- 
urements made with separate stimuli. 
For superimposed stimuli, the Weber 
fraction has been calculated directly 
from Zegers’ data. In this case, the 
ratio is directly proportional to the 
angular offset existing between the 
reference and comparison stimuli. 
As Zegers has indicated, the value of 
the angular offset (and the Weber 
ratio) increases with speed. 
Examination of the curves of 
Figure 4 suggests a useful empirical 
generalization. The Weber fraction 
for nonsuperimposed stimuli is ap- 
proximately constant in the mid- 


range of angular speeds. Thus, in the 
range of 0.1 to 20° per second, Aw/w 
shows no greater change than a 
doubling. For adjacent stimuli, the 
maximal ratio is only 2.2 times 
greater than the minimal ratio. For 
separate stimuli, there is a change 
by a factor of 1.9. Although the 
Weber fraction may be fairly con- 
stant in the middle range of stimulus 
values, the rapid rise of the curve for 
superimposed stimuli suggests that 
the ratio may increase markedly at 
extremes. 

The constancy of the Weber ratio 
for differential speed thresholds may 
be interpreted at a descriptive level 
for comparison with other sensory 
discriminations. Woodworth and 
Schlosberg (1954) have indicated that 
for many sensory discriminations the 
differential threshold is a measure of 
the variability of the effects of stimu- 
lation, i.e., Aw=Ko,. For discrim- 
inations of motion according to 
Brown (1960), the variability in turn 
is proportional to the speed, i.e., 
¢.=Cw. It is therefore not surpris- 
ing that Aw/w is constant, at least 
within limits which are not too well 
defined in Figure 4. 

It is of interest to compare the 
magnitude of the ratio with that for 
other discriminations. Under op- 
timal conditions, the minimal Weber 
fraction with superimposed stimuli 
is comparable to that measured for 
pitch discrimination with a standard 
tone and a comparison tone differing 
slightly in frequency. Measurements 
of pitch discrimination indicate that 
Weber’s fraction is constant at about 
0.002 beyond 250 cycles per second, 
rising somewhat at the lower fre- 
quencies. The differential speed 
threshold ratio, as measured with 
separate stimuli, is comparable to 
the Weber fraction for lifted weights. 
When measured by lifting weights 
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successively with one hand, the 
Weber ratio is approximately 0.075 
for weights greater than 200 grams. 
We may conclude that the Weber 
ratio for differential speed thresholds 
not only is constant in a medium 
range of stimulus values, but also is of 
the same order of magnitude as that 
found for other discriminations. 


TRACKING BEHAVIOR 


Studies of tracking behavior illus- 
trate an application of the Weber 
ratio for differential speed thresholds. 
This application is of particular in- 
terest since earlier reviews have em- 
phasized the significant motor char- 
acteristics of tracking behavior (Bir- 
mingham & Taylor, 1954; Fitts, 
1951). Perceptual characteristics 


have been implied by occasional ob- 
servations that an operator tracks a 
target quickly and efficiently under 
optimal conditions because he es- 
timates its present speed and accel- 
eration and thereby anticipates its 


future motion. During World War 
II, for example, the systematic in- 
vestigation of the manual controls 
for antiaircraft fire control systems 
indicated the anticipatory nature of 
tracking, as discussed by Helson 
(1949). 

Foxboro studies. In the Foxboro 
studies directed by Helson, error was 
recorded for compensatory tracking 
in which the tracker tries to keep a 
moving pointer aligned as much as 
possible with a stationary reference 
pointer. Compensatory tracking may 
be contrasted with pursuit tracking 
in which both pointers move and the 
tracker aligns the following cursor 
under his control with the moving 
target pointer. In the Foxboro studies 
the tracker compensated for the dis- 
placement of a moving pointer, repre- 
senting the aiming point, from the 
actual position indicated by a sta- 


tionary pointer. Tracking error was 
measured by the time required for 
the target to move from its actual 
position to the aiming point. 

Speed of the handwheel rotation 
was a major variable controlling 
tracking accuracy. For a constant- 
speed unidirectional course, with an 
increase in rate of cranking, the 
tracking error decreased from 55 to 6 
milliseconds when a light handwheel 
of 2.25-inch radius was used (Fox- 
boro Company, 1943a). Since the 
tracking error was consistently of the 
order of milliseconds and could be as 
small as one hundredth of the fastest 
reaction time, it is evident that the 
tracker anticipated the future motion 
of the target and thereby avoided the 
series of oscillations his long reaction 
time would otherwise produce. 

For simple sinusoidal courses, the 
tracker not only anticipated the mo- 
tion of the target but also used an 
averaging motion of the handwheel 
when the course was of too high a 
frequency to follow exactly. As 
course frequency increased, the 
tracker eliminated terminal portions 
of swings. Inertia in the form of a 
heavy handwheel or a flywheel effect 
smoothed the direct tracking of 
courses not requiring high accelera- 
tions and rapid reversals-in direction 
(Foxboro Company, 1943b). In addi- 
tion, the averaging type of behavior 
was dependent upon practice and 
familiarity with the course being 
tracked. 

Contemporary models for tracking 
behavior. Since World War II, the 
concept of feedback mechanisms has 
been generalized to the entire field of 
control and communication theory in 
machines and animals (Wiener, 1948). 
As applied to antiaircraft fire control 
behavior, the concept states that the 
tracker uses the difference between 
the stimulus of a target’s motion and 
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his response as a new input to make 
his motion correspond more closely 
to that of the target. Engineers 
analyzed human _ tracking per- 
formance in terms of simple servo 
systems with feedback (James, Ni- 
chols, & Phillips, 1947; Raggazini, 
1948; Tustin, 1947). Stimulated by 
the mathematical systems equations 
which emerged from this analysis, 
psychologists have developed their 
own models to describe the behavior 
involved in minimizing the difference 
between two positions with control 
of one (Birmingham & Taylor, 1954; 
Fitts, 1951; Noble, Fitts, & Warren, 
1955). These models make two basic 
assumptions: intermittency of re- 
sponse, and predictiveness of re- 
sponse. 

Intermittency of tracking responses. 
Despite the smooth and apparently 
continuous appearance of efficient 
tracking, experimental evidence from 
several sources indicates that the 
tracker responds intermittently. 
First, a time record of tracking per- 
formance shows a typical periodicity 
with a predominant frequency of two 
responses per second (Craik, 1947; 
Ellson, Hill, & Gray, 1947). Second, 
analysis of the response patterns to a 
step input displacement of position 
shows that quick corrective move- 
ments occur without visual or kines- 
thetic guidance and that the typical 
time for completing a _ corrective 
movement, including reaction time, 
is approximately 0.5 second (Cherni- 
koff & Taylor, 1952; Searle & Taylor, 
1948; Taylor & Birmingham, 1948). 
Third, the assumption that the 
tracker responds intermittently at 
0.5-second intervals during continu- 
ous: tracking agrees with the optimal 
time constant obtained for conven- 
tional aided tracking (Birmingham & 
Taylor, 1954; Mechler, Russell, & 
Preston, 1949). Fourth, with the as- 


sumption of 0.5-second intermittency 
of corrections, one may predict the 
optimal time constants for more com- 
plex aided-tracking control systems 
involving an acceleration component 
as well as the conventional position 
and rate controls (Searle, 1951). 
Predictiveness of tracking responses. 
The assumption of predictiveness in 
tracking responses is supported by 
the following findings. First, the 
Foxboro studies showed that the 
time error for manual handwheel 
tracking is much less than the reac- 
tion time, as discussed above. Sec- 
ond, pursuit tracking usually yields 
lower error scores than compensa- 
tory tracking (Chernikoff, Birming- 
ham, & Taylor, 1955; Poulton, 1952; 
Senders & Cruzen, 1952). In the pur- 
suit mode of tracking, responses may 
be made on the basis of a predictable 
course of the target since its marker 
moves independently of the marker 
with which the tracker follows. In 
the compensatory mode, prediction 
must be limited to the tracking error 
since the tracker attempts to stabilize 
a moving marker representing the 
difference between target motion and 
his own control motion. Third, 
Chernikoff et al. (1955) found that an 
aided-tracking control impairs per- 
formance for the pursuit mode but 
materially improves it for the com- 
pensatory situation. They resolved 
this apparently paradoxical finding 
by considering the nature of aided- 
tracking controls in terms of the pre- 
dictiveness of tracking responses. 
With a position control, the position 
of the moving marker controlled by 
the tracker is directly proportional 
to the position of his control. With 
aided tracking, a movement of the 
control not only causes a propor- 
tional change in the position of the 
marker, but also introduces a change 
in its rate of motion. The aided- 
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tracking time constant is yielded by 
the ratio of the control sensitivities. 
With the proper time constant in 
compensatory tracking, the operator 
can correct an error with a control 
motion proportional to the position 
component of the error. He thereby 
sets in changes in rate of motion in 
amounts that are correct on the aver- 
age to match the target motion. 
Use of the aided control in pursuit 
tracking requires that the tracker 
ignore target velocity and not at- 
tempt to predict future position. 
Later experiments by Chernikoff 
and Taylor (1957) have indicated an 
effect of target speed on the optimal 
time constant for both pursuit and 
compensatory tracking. 

Tracking error. With verification 
of the assumptions of intermittency 
and predictiveness for tracking per- 
formance, it is evident how differen- 
tial speed thresholds limit the 
tracker’s responses with a position 
control. It may be assumed that ata 
given instant in time the tracker is 
exactly on target but that his cursor 
and the target are moving at different 
speeds. During a short period of 
time, the position error generated is 
approximately the product of this 
speed difference and the temporal 
interval. Since response intermit- 
tency holds the temporal interval 
constant, the tracking error is di- 
rectly proportional to the speed dif- 
ference which the tracker can dis- 
criminate. 

Speed of target motion seems to 
have the same effect on tracking error 
as it has on the differential speed 
threshold as measured with nonsuper- 
imposed stimuli, i.e., tracking error 
increases as a linear function of 
speed. Bowen and Chernikoff (1958) 
have investigated the relationship 
between magnification, speed of tar- 
get motion, and tracking error with 


a compensatory position-control sys- 
tem. Both with and without mag- 
nification, measures of tracking per- 
formance did not vary for a constant 
target speed when the frequency and 
amplitude of motion were varied over 
a range useful in tracking research. 
Tracking error increased with an in- 
crease in average speed of target mo- 
tion. Departures from a linear rela- 


tionship were not large. 


PREDICTIVE BEHAVIOR 
Prediction Motion 


Data from Gottsdanker’s series of 
studies of prediction motion demon- 
strate a marked similarity of pursuit 
tracking error to the differential 
speed threshold for adjacent stimuli. 
Similar to the differential threshold 
(approximately 14% of the speed) is 
the average error a tracker makes in 
following a target which moves at a 
constant but suddenly dis- 
appears. During the second following 
the disappearance, the tracker main- 
tains the speed with an average error 
of 13, 14, and 16%, as measured in 
three separate studies by Gotts- 
danker (1952a, 1952b, 1955) 

On some trials when the target was 
accelerating or decelerating at the 
moment of disappearance, the tracker 
did not continue the uniform change 
in speed. It should be noted, how- 
ever, that at the moment of disap- 
pearance the change in speed for a 
0.5-second interval was only 5 to 7% 
of the speed and presumably was be- 
low the tracker’s threshold. Gotts- 
danker (1956) has reviewed the ex- 
perimental literature on responses to 
acceleration of target motion. He 
concluded that smoothly accelerated 
motion is generally responded to as if 
the speed were constant, i.e., the 
change in speed did not exceed the 
differential speed threshold in the 
studies cited. 


speed 
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Gottsdanker (1952a) has measured 
the tracking error not only for dis- 
appearing targets, but also for com- 
pleted courses. The measured error 
is consistent with one calculated upon 
the basis of the assumptions of a 
0.5-second intermittency in response 
and a 14% speed threshold. The aver- 
age error in tracking a target mov- 
ing at a constant speed of 8 millime- 
ters per second was 0.50 millimeters. 
If the tracker were exactly on target 
at a given instant, his error a half- 
second later would be calculated from 
the assumptions as the prodrct of 
0.14X8X<0.50 or 0.56 millimeters, 
and the average error during the in- 
terval should be 0.28 millimeter. It 
may be assumed more realistically 
that the tracker was not exactly on 
target at the beginning of the in- 
terval. The average error should be 
calculated as correspondingly greater 
than the minimal value of 0.28 milli- 
meter. 


The prediction of tracking error 
from the Weber ratio for speed dis- 
criminations is not limited to visually 
presented stimuli, but may be ex- 
tended to other stimuli. Gottsdanker 
(1954) has measured the precision of 
tapping at a constant rate of two per 


second. He found that subjects 
could maintain this rate to an ac- 
curacy of 2.4% when the stimulus of 
pops from a magnetic tape playback 
was removed. In the Foxboro studies 
it was found that the tracker could 
utilize the increased precision of 
rapid repetitive movements in fast 
handwheel cranking over the inter- 
mittent corrective responses of slower 
handwheel turning. As an approxi- 
mation, the tracking error should be 
limited by the product of the repeti- 
tion rate threshold and the time for 
each repetitive movement at the 
faster speeds. For example, the time 
error should be the product of 0.024 
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and 0.25 second per repetitive move- 
ment for cranking at 120 rpm. This 
value coincides exactly with the 
measured time error of 6 milliseconds 
for the light handwheel with short 
radius. 


Prediction of Future Positions 
of a Moving Target 


Although the differential speed 
threshold would seem to be clearly 
related to predictions of future posi- 
tion of a moving object, data on the 
nature of the relationship are lim- 
ited. Slater-Hammel (1955) has 
had subjects observe a marker mov- 
ing at a uniform speed over different 
display distances and then had them 
estimate when the marker would 
complete traversing different target 
distances. The display distance did 
not affect the error in time of estimat- 
ing the arrival of the uniformly mov- 
ing marker at a specified point in 
space. However, the error increased 
systematically with an increase in 
the target distance which the marker 
traversed after disappearing. In 
terms of percentage of the required 
time, the error varied between 8.9% 
and 21.6%. These values agree with 
expectations based on the Weber 
ratio for speed discriminations with 
nonsuperimposed stimuli (cf. Figure 
4). 

Morin, Grant, and Nystrom (1956) 
have reported similar results despite 
two important differences in their 
experimental procedure. First, in- 
stead of Slater-Hammel’s stimulus 
which moved continuously at a con- 
stant speed, Morin et al. used the 
successive illumination of cue lights 
which were placed at even intervals 
in a horizontal row. After illumina- 
tion of the last cue light, the subject 
estimated the time it would take the 
imaginary moving object to reach a 
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target light. Second, the object 
traveled at a rather slow computed 
speed of either 0.179 or 0.358° per 
second rather than the speed of ap- 
proximately 5° per second used by 
Slater-Hammel. Results obtained by 
Morin et al. confirmed the fact that 
the error of estimating arrival in- 
creases with target distance. Signif- 
icantly, they also found for their 
faster speed that the mean errors of 
estimation were generally less than 
10% of the computed time. When 
the speed was 0.179° per second, the 
mean errors of estimation ranged 
from 25 to 53%. These values sug- 
gest an apparent extrapolation to 
slow speeds of the data presented in 
Figure 4. 

Garvey, Knowles, and Newlin 
(1956) have measured the accuracy 
of prediction in terms of deviations 
in range and bearing between esti- 
mated and actual position plots on 
four different radar displays. They 
found that accuracy of estimated 


position was a function of target 
speed, i.e., the faster the motion of 
the target the less accurate the es- 


timate. This relationship resembles 
that between Aw and w of Figure 3. 
Gottsdanker and Edwards (1957) 
have studied a more complex type of 
prediction situation. Two targets 
moved down perpendicular paths 
towards an intersection but disap- 
peared before reaching it. The sub- 
ject estimated where one target 
would be when the other crossed the 
intersection. Gottsdanker concluded 
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that for both accelerated and con- 
stant-speed targets the prediction 
was based on relative positions at 
time of the target’s disappearance 
rather than on relative speeds or 
accelerations. 


SUMMARY 


Measurements of the differential 
speed threshold (Aw) have been 
plotted against speed (w) for com- 
parison stimuli which were presented 
adjacent, separate, or superimposed. 
As a rough approximation, the 
threshold increases in direct propor- 
tion to speed for nonsuperimposed 
stimuli over a range from 0.1 to 20° 
per second (Aw= Kw). Although the 
relationship for superimposed stimuli 
(monocular parallax) is similar, in- 
adequate ocular following movements 
and receptor intensity effects modify 
the relationship at fast speeds 
(greater than 5° per second). Esti- 
mates of the Weber ratio (Aw/w) of 
0.138 for adjacent stimuli and of 
0.0768 for separate stimuli provide a 
basis for interpretation of tracking 
and other predictive behavior. Ex- 
periments support the assumptions of 
intermittency and predictiveness of 
responses in tracking. With these 
assumptions, error in performance 
may be calculated for relatively 
simple tasks from the Weber ratio. 
For more complex tasks, constancy 
of the Weber ratio agrees with the 
linear relationship found between 
tracking error and speed of target 
motion. 
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SELF-ACCEPTANCE AND SELF-EVALUATIVE BEHAVIOR: 
A CRITIQUE OF METHODOLOGY 


DOUGLAS P. CROWNE 
Ohio State University 


“Self-acceptance’’ has become a 
popular concept in psychological 
literature. Along with “rigidity,” 
“authoritarianism,’’ and ‘‘conform- 
ity,”’ it has come to particular promi- 
nence in the last decade, perhaps re- 
flecting an evolution in value systems 
in American culture. Concepts per- 
taining to the self have been given 
considerable space in the writings of 
personality theorists and social-per- 
sonality psychologists and inevitably 
have found their way into psycho- 
logical research. 

Self-acceptance has been particu- 
larly identified with Rogers’ person- 
ality theory and is accorded the 
status in that system of a major 
therapeutic goal. Phenomenological 


research on self-acceptance dates 
from the classic study of Raimy 


(1948). However, very similar con- 
cepts have played dominant roles in 
other theories—e.g., Snygg and 
Combs (1949), Horney (1950), and 
Sullivan (1953). More important, 
self-acceptance seems to have been 
pre-empted for less systematic, ec- 
lectic usage by a great many practic- 
ing clinicians and researchers (Cowen, 
1956; Cowen, Heilizer, Axelrod, & 
Alexander, 1957; Zuckerman, Baer, 
& Monashkin, 1956; Zuckerman & 
Monashkin, 1957). The major por- 
tion of the research onjself-acceptance 
derives from Rogers’ self-theory, but 


1 The authors would like to express their in- 
debtedness to the following persons, who 
critically read this paper and made a number 
of valuable suggestions: Donald Campbell, 
Shephard Liverant, Julian Rotter, Lee Sech- 
rest, Charles Smock, and Janet Taylor. 
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studies based on other theories (Block 
& Thomas, 1955; Sarbin & Rosen- 
berg, 1955) and the generally empiri- 
cal investigations referred to above 
attest to the breadth of current 
interest in the behaviors subsumed 
under this broadly interpreted con- 
struct. 

While no single definition of self- 
acceptance would be accepted by all 
who use the term, the phenomeno- 
logical view of Rogers seems to rep- 
resent at least a common point of 
departure. From the definition of a 
self-concept construct the concept of 
self-acceptance is derived, referring, 
at least operationally, to the extent 
to which this self-concept is con- 
gruent with the individual’s de- 
scription of his ‘‘ideal self.’ 

The majority of self-acceptance 
tests have followed this model (see 
Table 1). A somewhat different 
psychometric model has been pro- 
posed by Gough (1955), in which self- 
acceptance is inferred from the ratio 
of “‘favorable”’ self-descriptive state- 
ments to the total number of self- 
descriptive statements made by the 
subject. 

A common denominator in the 
definition of self-acceptance, judging 
from the operations employed in its 
assessment, would seem to be the 
degree of self-satisfaction in self- 
evaluation. This definitional con- 
sensus, however, is achieved at the 
level of operations, and other mean- 
ings may be implied by self-accept- 
ance constructs. Phenomenological 
theorists, for example, appear to be 
interested in an ‘“‘internal’’ phe- 
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TABLE 
CLASSIFICATION OF SOME 
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TEsTs OF SELF-ACCEPTANCE 





Name of Test Type 





SIO (self-ideal-other ) 
Q sort (Rogers & Dy- 
mond, 1954) 


Q sort 


Index of Adjustment 
and Values (Bills, 
1958; Bills, Vance, & 
McLean, 1951) 


Adjective rating scale 


Adjective Check-List 
(Gough, 1955) 


Adjective check list 


Buss scale (Buss & 
Gerjuoy, 1957; Zuck- 
erman & Monashkin, 
1957) 


Adjective check list 


Self-Rating Inventory 
(Brownfain, 1952) 


Self-rating scale 


Attitudes toward Self 
and Others Question- 
naire (Phillips, 1951) 


Self-rating scale 


Berger Self-Acceptance Self-rating scale 
scale (Berger, 1952) 


Interpersonal Check Adjective check list 
List (LaForge & Suc- 
zek, 1955) 


nomenal state. Other theorists 
(Block & Thomas, 1955) have formu- 
lated self-acceptance as a function of 
an ego-control construct. The phe- 
nomenological concept of Rogers and 
the psychoanalytic set of meanings 
implied by Block and Thomas’ con- 
struct of ego control probably diverge 
in important respects. The purpose 
here, however, is merely to illustrate 
the point that emphasis on defini- 
tional clarity achieved at an opera- 
tional level tends to ignore the 


Score Obtained 





Pearson correlation between sorts of self and ideal 
on 100 items. Also, “adjustment score” based on 
number of favorable statements placed on “‘like 
me”’ end of distribution and number of unfavora- 
ble statements placed on “unlike me” end. 


Self-acceptance score=sum of self-concept rat- 
ings (1—5 scale) on 49 traits. Also, a self-ideal dis- 
crepancy score is calculated. Norms available. 


Self-acceptance score = number of favorable ad- 
jectives checked divided by total number of ad- 
jectives checked. 


Sum of differences without regard to sign of scale 
values (based on psychologists’ ratings) of adjec- 
tives checked on self and ideal descriptions. 


“Positive self-concept” and “negative self-con- 
cept” scores. Self-acceptance=sum of positive 
self-concept description weights minus negative 
self-concept description weights, disregarding 
sign. 


Sum of weights (1-5) on each item. Norms avail- 
able. 


Sum of item weights (1-5). 


Intensity scale values for each adjective (1-4). 
Self-acceptance =discrepancy between self and 
ideal ratings. 


probably significant differences in the 
implied theoretical meanings of self- 
acceptance. 

Reflecting in part the widespread 
interest in self-acceptance are the 
numerous instruments which have 
been devised to measure the con- 
struct. A striking phenomenon of 
research in this area is that these 
tests, characterized by a diversity of 
both theoretical and psychometric 
models, have apparently been as- 
sumed to be interchangeable. Thus, 
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characteristic of self-acceptance re- 
search appears to be a basic concep- 
tion that measures of this construct 
possess face validity: that is, in a 
simple denotative sense, the tests 
are viewed as being manifestly similar 
(Peak, 1953). 

Criterion validation of self-accept- 
ance tests is, of course, logically im- 
possible, and attempts at construct 
validation do not lend much faith in 
the validity even of a particular test, 
much less of all the different tests. 
Face validity, however, has appar- 
ently been assumed without question. 
The acceptance of face validity— 
that is, manifest similarity—implies 
adherence to a further assumption 
incorporated in phenomenological 


theory—that of the validity of self- 
reports (Rogers, 1951, p. 494). In 
terms of these assumptions, a self- 
acceptance test is valid if it looks like 
a self-acceptance test and is similar 
to other tests, and what a person says 
about himself self-evaluatively is ac- 


cepted as a valid indication of how 
he “‘really’’ feels about himself. 

The acceptance of these assump- 
tions, whether acknowledged or im- 
plicit, has definite implications for 
the assessment of self-acceptance and 
for the interpretation of experimental 
results in this area. This paper will 
show that there are four major prob- 
lems in the measurement of this con- 
struct and that, in view of the com- 
mon adherence to these assumptions, 
the results of studies on self-accept- 
ance are rendered highly ambiguous. 
These issues seem, despite their 
essential pertinence to research on 
self-acceptance, to have been suffi- 
ciently ignored to warrant exposition 
in this paper. It will be seen that 
these issues are not limited solely to 
self-acceptance, but represent in- 
stead basic logical and psychometric 
considerations which may serve to 
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illustrate problems in personality re- 
search in general. 


EQUIVALENCE OF OPERATIONS 


As observed above, the diverse 
tests of self-acceptance have been 
assumed to be equivalent operations 
for measuring behaviors subsumed 
under the construct. The failure of 
experimenters to consider the problem 
of the equivalence of assessment 
operations in published reports (Bills, 
Vance, & McLean, 1951; Block & 
Thomas, 1955; Calvin & Holtzman, 
1953; Cowen, Heilizer, & Axelrod, 
1955; Hillson & Worchel, 1957; 
Phillips, 1951) raises the question of 
the basis on which the findings of 
individual studies employing differ- 
ent measuring operations are gen- 
eralized and incorporated in the 
larger body of self-acceptance re- 
search. The basis of generalization, 
in view of the absence of explicit con- 
sideration of the question, must be 
inferred to lie in the assumption of 
face validity as defined above. Even 
statements implying’ differences 
among self-acceptance tests fail to 
deal with the logically sequential 
question of the extent to which these 
differences may mean that self-ac- 
ceptance as measured by Test 1 is not 
the same as self-acceptance as meas- 
ured by Test 2. The following excerpt 
illustrates this point (Cowen et al, 
1955): 

Presumably each of these classes of [self- 
acceptance] measures has certain peculiar 
advantages and limitations. .. . In any case, 
a good many data have now been presented 
demonstrating some empirical validity for 
both types of measures (i.e., they can discrim- 
inate among subjects with respect to other 
personality and behavioral indices in a man- 
ner roughly consistent with predictive ex- 


pectations based on phenomenological theory) 
(p. 242). 


These writers do not make clear 
what relationship obtains between 
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the classes of self-acceptance tests 
(tests yielding discrepancy scores 
versus self-concept rating devices) 
or, more basically, how phenomeno- 
logical personality theory can lead to 
operations that apparently can satisfy 
certain predictions in the case of one 
class of instruments but requires 
different operations to obtain posi- 
tive results from other hypotheses 
based on the same construct. 

According to the notion of face 
validity, what looks like a test of self- 
acceptance zs such, by definition. All 
the test constructor is required to do, 
in terms of this criterion, is to elicit 
self-evaluative statements from sub- 
jects. All measures that conform to 
this requirement achieve validity and 
are therefore equivalent. By this 
procedure the test itself becomes the 
construct, in the sense of the narrow- 
est kind of operational definition. 

An operational definition stating 
what is measured by a given device 
or procedure in terms of specified 
measurement operations is, of course, 
a perfectly legitimate and necessary 
procedure in scientific investigation 
as long as the interpretation of results 
is strictly confined to the particular test 
or measurement procedure. A problem 
arises, however, when an attempt is 
made to generalize from experimental 
findings with a particular test to re- 
sults obtained by different assessment 
operations. The problem similarly 
occurs in another case when a certain 
test is applied to an experimental 
problem and negative results are 
interpreted as disconfirming the hy- 
potheses relating the construct to 
observables. As Jessor and Ham- 
mond (1957) have pointed out, in the 
absence of an explicit, logical rela- 
tionship between the superordinate 
construct and the operations de- 
signed to assess it, conclusions can- 
not be made concerning the validity 
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of the hypotheses since invalid meas- 
urement operations could equally 
account for negative findings. 

The point at issue is that tests of 
self-acceptance (or, for that matter, 
of any construct) which are based on 
different construct systems and in 
the development of which different 
procedures and items have been em- 
ployed are not equivalent in the ab- 
sence of empirical demonstration of 
their relationships; they must be 
shown to be either highly related to 
each other or similarly related to 
other constructs in the nomological 
net. Further, in the absence of dem- 
onstrated equivalence, experimental 
results cannot be generalized to find- 
ings with a different instrument. 
This seems to be so obvious a con- 
sideration that explication here is 
redundant. The fact remains, how- 
ever, that the equivalence of self- 
acceptance tests has been assumed 
despite their independent derivation 
and despite the relative lack of em- 
pirical demonstration that there is a 
high degree of common variance 
among them. 

In respect to the latter point, three 
studies are of interest. Bills (1958) 
reports a correlation of .24 between 
the self-concept score on the Index of 
Adjustment and Values (IAV) and 
the “‘self-score’’ of the Phillips Atti- 
tudes Toward Self and Others Ques- 
tionnaire (1951). A correlation of .56 
is reported between the Bills self- 
ideal discrepancy score and the 
Phillips self-score. Omwake (1954) 
found a correlation of .55 between 
the IAV self-acceptance (self-ideal 
discrepancy) score and the self-score 
on the Phillips questionnaire and a 
correlation of .49 between the self- 
acceptance score on the IAV and the 
Berger self-acceptance scale (Berger, 
1952). In a recent study, Cowen 
(1956) found that two self-acceptance 
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measures yielding self-ideal discrep- 
ancy scores (Bills IAV and _ the 
Brownfain Self-Rating Inventory) 
were uncorrelated. The magnitude 
of these correlations indicates that 
the prediction of scores on one of these 
measures from scores on another 
would be accompanied by a wide 
margin of error. 

The diversity of item selection 
procedures, item content, type of 
response elicited, and test format 
which is characteristic of test con- 
struction in this area suggests that 
what is operationally defined as self- 
acceptance on one test may be quite 
different from the sample of self- 
evaluative behavior elicited in an- 
other psychometric situation. Fur- 
ther, self-acceptance is construed 
differently by different theorists (cf. 
Block & Thomas, 1955; Butler & 
Haigh, 1954; LaForge & Suczek, 1955; 
Sarbin & Rosenberg, 1955), and 


these definitional differences are un- 
doubtedly reflected in self-acceptance 


tests. 

Even if one grants the assumption 
of face validity with its clearly im- 
plied meaning of equivalence as made 
by the experimenter, to assume that 
subjects will perceive these psycho- 
metric situations in the same way is 
another matter. It is quite conceiv- 
able that subjects may categorize the 
self-evaluative situations represented 
by the various tests of self-acceptance 
quite differently, with the result that 
scores obtained on these measures 
will not be congruent. According to 
this argument, a subject’s expectan- 
cies that his goals will be achieved or 
frustrated as a result of his sorting a 
number of statements on a forced- 
choice distribution from ‘‘like me’’ to 
“unlike me” (Butler & Haigh, 1954) 
may be quite different from the ex- 
pectancies aroused by a situation in 
which he is asked to attribute certain 


DOUGLAS P. CROWNE AND MARK W. STEPHENS 


adjectival characteristics to himself 
(Gough, 1955). Ironically, a phe- 
nomenological definition of a self- 
report variable is particularly obli- 
gated to account for differences in 
the subject’s perception of the meas- 
urement device. In any case, unless 
it can be shown that there is a high 
degree of congruence of the various 
measures within the experimental 
populations sampled, one is without 
means of measuring self-acceptance 
as phenomenologically defined. The 
individual’s private, unique experi- 
ence of self-satisfaction or dissatis- 
faction remains, indeed, private. 

It seems highly probable that dif- 
ferences among self-acceptance tests 
plus the likelihood that subjects will 
categorize these tests differently may 
result in the sampling of relatively 
nonoverlapping behaviors by the 
various tests. To be recognized is the 
fact that this is an empirical problem 
for which, to the writers’ knowledge, 
the three studies cited above provide 
the only suggestive evidence.? The 
recently proposed model (Campbell 
& Fiske, 1959) for assessing conver- 
gent and discriminant validity would 
seem to be highly appropriate for 
determining the tenability of the as- 
sumption of equivalence of opera- 
tions for measuring self-acceptance. 


DEFINITION OF THE CONSTRUCT 
Specifying Parameters 


The ability to reach generalized 
conclusions from current self-accept- 
ance research seems to be limited by a 
failure to give adequate definitions 
to the construct itself. As Rotter 


2 Since the completion of this article, further 
research has been published which bears di- 
rectly on the problem of the equivalence of 
self-acceptance tests and suggests that a so- 
cially desirable response set may constitute a 
major source of variance (Crowne, Stephens, 
& Kelly, 1961). 
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(1954) has pointed out, it is impor- 
tant to distinguish between ideal, 
theoretical, and operational defini- 
tions of a given construct. An experi- 
menter can define self-acceptance, 
for example, as the behavior sample 
(or as the “internal’’ phenomenal 
state reflected by the behavior sample) 
obtained on a particular test. But he 
is usually not interested in restricting 
his interpretation of his findings (if 
any) to this limited behavior sample, 
and he seeks to place his results in 
the larger context of research by 
other investigators and to generalize 
his findings to “real life’’ situations 
such as those encountered in clinical 
practice. By a narrow interpretation 
of operationism, the experimenter has 
made it logically indefensible to re- 
late his findings to a theoretical sys- 
tem, to results obtained with other 
measurement devices, or to “real 
life’’ situations. When nothing more 
than an operational definition is 


offered, the parameters defining the 
variable are not specifiable, and there 
is no basis for generalization of the 
results. 

At the other extreme, definitions 


of self-acceptance at an abstract 
level, not specifically articulated with 
other variables in a theory or tied 
to a specific test, are apt to be seman- 
tically loose and to be subject to 
differing interpretations. It is true, 
of course, that definitions of variables 
at this level transcend any particular 
set of operations and can usually be 
applied to an infinite variety of situa- 
tions and behaviors. The looseness of 
such definitions, however, precludes 
rigorous tests of hypotheses and 
makes precise communication impos- 
sible. In self-acceptance research 
there have been few if any definitions 
of the construct which are not either 
rigidly operational or highly abstract. 

The deduction from an abstract 
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definition, with all its surplus mean- 
ings, to specific operations is likely to 
be a tenuous one and, perhaps more 
often than not, is a private, nonre- 
peatable process. An intervening 
step is necessary in which the con- 
struct is broadly defined in terms of 
specific behavioral referents and pref- 
erably in relation to other variables 
in a specific theory. A “working 
definition,’ as Rotter has defined it, 
clearly represents an attempt to 
specify the parameters of the variable 
in question so that both generality 
and precise communication aregained. 
Self-acceptance research appears to 
have lacked such definitions. 

Although this paper is chiefly con- 
cerned with pointing out certain 
methodological pitfalls in research on 
self-acceptance, clarification 
may be achieved by defining briefly 
this intermediate theoretical step and 
attempting to relate the logic of con- 
struct validation to the more general 
theoretical problem. Rotter’s work- 
ing definition could be described as a 
definition at the construct level. In 
terms of this view, the behavioral 
referents and the hypothesized relation- 
ships of the construct are described 
as part of its definition—that is, the 
implied meanings of the term are 
publicly specified. In effect, specify- 
ing the behavioral referents and 
hypothesized relationships reduce to 
the same thing: locating the construct 
in a nomological net. In the language 
of test construction, Cronbach and 
Meehl (1955) write: 


some 


Construct validation takes place when an 
investigator believes that his instrument re- 
flects a particular construct to which are at- 
tached certain meanings. The proposed inter- 
pretation generates specific testable hypothe- 
ses, which are a means of confirming or dis- 
confirming the claim....To validate a 
claim that a test measures a construct a 
nomological net surrounding the concept must 
exist {italics added] (pp. 290-291). 
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The logic of construct validation can- 
not be invoked to justify the identi- 
fication of a particular set of opera- 
tions as unique to a given construct, 
nor does it support the view that a 
construct is “‘validated’’ by the con- 
firmation of a single hypothesis. The 
establishment of a single relationship 
belongs more properly in the domain 
of criterion oriented validity, as Cron- 
bach and Meehl point out. With con- 
struct validation procedures clearly 
at issue, it would seem to be desirable 
to specify in advance the referents of 
self-acceptance. When the situations 
in which the behaviors subsumed 
under the construct and the behaviors 
themselves are identified, some idea 
of the generality and functional unity 
of self-acceptance is afforded, and 
relationships to other constructs, 
situations, and measurement opera- 
tions can be suggested at a logical 
level. 

Underwood (1957) has described 
the difficulty in moving from theo- 
retical definitions (or constructs) to 
operational definitions—a difficulty 
that appears to be characteristic of 
psychological research. Campbell 
and Fiske (1959) have extended 
Underwood’s point to show that the 
transition from operations to con- 
struct can involve perplexities equally 
difficult. The essence of the latter 
problem is that a single set of opera- 
tions is capable of multiple interpre- 
tations; convergence on a single inter- 
pretation (that is, establishing that a 
relationship holds in a_ particular 
nomological net and cannot be more 
adequately accounted for in another 
net) is achieved by a process of tri- 
angulation from a number of different 
operations. Convergent validation, 
however, involves complex designs 
and extensive preliminary research 
efforts. Further, convergent valida- 
tion does not necessarily help to make 
more explicit the descent from a theo- 
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retical model to measurement opera- 
tions. According to the present view 
of definition at the construct level, 
this explicitness would be achieved 
and the reverse problem, that of 
interpreting results from a set of 
operations, might be at least partially 
solved. That is, alternative explana- 
tions of experimental findings could 
be examined in the light of the hy- 
pothesized relationships proposed in 
different construct systems claiming 
to explain the same body of data, 
with the result that incomplete or 
inconsistent interpretations might 
be discarded in favor of interpreta- 
tions whose “‘fit’’ to the data is more 
adequate. 

For example, phenomenological 
theory implicitly hypothesizes a lin- 
ear relationship between self-accept- 
ance and adjustment (Butler & 
Haigh, 1954), while acknowledging 
the possibility that very high re- 
ported self-acceptance may indicate 


“defensive” unwillingness to reveal 


personal dissatisfaction. Block and 
Thomas (1955), however, have shown 
that a curvilinear model, in which 
both very high and very low self- 
acceptance are associated with mal- 
adjustment, affords a better explana- 
tion of the phenomenon of defensive- 
ness. It is conceivable that more ex- 
plicit formulation of the phenomeno- 
logical seif-acceptance construct and 
its derived test procedures might 
have provided a more adequate ex- 
planation of defensive responding in 
the Butler and Haigh study. More 
precise definition of the variable in 
question might thus have directed a 
search for operations less susceptible 
to systematic response bias. 

In a recent paper, Cowen and 
Tongas (1959) have reviewed a num- 
ber of construct validation studies on 
the IAV (Bills, 1958). They point to 
the fact that several of these studies 
have reported significant results in 
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the direction opposite to theoretical 
expectation. In one study, on 10 of 21 
hypotheses specifying differences be- 
tween high and low self-acceptance 
scorers, many differences were found 
which indicated that subjects with 
high self-acceptance scores were more 
maladjusted than low scorers (Bills, 
1953a). As Cowen and Tongas ob- 
serve, high self-acceptance should 
theoretically be associated with satis- 
factory adjustment, not maladjust- 
ment. Another theoretical incon- 
sistency occurred in the failure to 
show that lowered self-concept rat- 
ings and longer response times in 
word association are associated with 
conflict and emotionality. The re- 
sults of this study were again, in fact, 
significant in the opposite direction 
(Bills, 1953b). Bills interpreted these 
findings as indicating a decrease in 
defensiveness. Cowen and Tongas 
argue, however, that: 


Unless procedures can be specified before 
the fact, by which we can discriminate the 
high SC (self-concept) score representing good 
adjustment from the high SC score represent- 
ing defensiveness, we are operating within a 
closed system in which the results of a given 
experiment, irrespective of their direction, 
can be interpreted as confirming the under- 
lying theory (pp. 362-363). 


Self-acceptance research is in need 
of clear construct-level definitions in 
which the relationships of the con- 
struct to other variables are explicitly 
stated. These definitions must refer 
primarily to the relationship of self- 
acceptance to other variables in the 
general theory in which the construct 
is embedded. Depending upon the 
particular theory, definitions might 
specify the nature of the relationship 
of self-acceptance to adjustment; to 
such personality variables as creativ- 
ity, neuroticism, and defensiveness; 
to interpersonal variables such as 
acceptance of others; to environ- 
mental, social, and cultural variables 
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as, for example, the role of cultural 
sanctions in self-evaluation, or the 
influence of the experimental (or 


therapeutic) context on self-appraisal. 


Representative Sampling 


A second problem associated with 
the definition of the parameters of 
self-acceptance concerns the rep- 
resentative sampling of self-accept- 
ance test items. As applied to the 
construct of self-acceptance, the prob- 
lem of representative sampling is 
involved in the systematic sampling 
of some specified universe of self- 
evaluative behaviors. Assuming that 
one has defined this population theo- 
retically, it is then of importance to 
draw one’s sample of test items in 
such a way as to represent their oc- 
currence in the population. The 
achievement of representative sam- 
pling in this respect means that gen- 
eralization can reasonably be at- 
tempted to other situations and/or 
behaviors than those of a particular 
experiment or test. Although the 
behavioral referents of self-accept- 
ance might seem obvious, on closer 
scrutiny it appears that there is 
notable confusion resulting from a 
lack of consensus as to what these 
referents are. 

Some examples from published 
research may illustrate what is im- 
plied by failure to sample representa- 
tively a population of self-evaluative 
behaviors. Butler and Haigh (1954) 
begin with Rogers’ abstract defini- 
tion of the self-concept. Then, they 
write: 

A set of one hundred [self-reference] state- 
ments was taken at random from available 
therapeutic protocols. (Actually, the state- 


ments were selected on the basis of accidental, 
rather than random, sampling) (p. 57). 


The population of relevant self- 
percepts was therefore restricted to 
those verbalized by some sample of 
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clients in client centered therapy, the 
basis for sampling was accidental, 
and thus there is no precise definition 
of self-acceptance in terms of what 
particular self-percepts define its 
parameters. The finding that changes 
in self-acceptance were demonstrated 
to occur as a function of client cen- 
tered therapy is thereby limited to 
the particular conditions of this ex- 
periment, the subject population 
used, and the particular items em- 
ployed in the Q sort measure. For 
example, it is quite possible (but un- 
known) that the statements used 
comprise a sample biased in favor of 
client centered counseling as _per- 
ceived and defined by the judges 
(presumably Butler & Haigh) who 
selected the items. 

A second example can be seen in 
the development of the IAV (Bills 
et al., 1951). The items (adjectives) 
in the IAV were drawn from Allport 
and Odbert’s (1936) list of 17,953 
traits. The basis of selection was the 


frequent appearance of the adjective 
in question in client centered inter- 
views and whether it presented a 
“clear example of self-concept defini- 


tion.”’ Self-evaluation on the IAV, 
then, pertains only to the Allport and 
Odbert traits mentioned frequently 
in client centered interviews, and 
generalization to other self-evalua- 
tive situations, or traits, would be 
tenuous. 

Gough’s (1955) Adjective Check 
List (ACL) affords a third illustra- 
tive example. The ACL consists of 
300 adjectives selected from Cattell’s 
(1943, 1946) consolidation and fac- 
torization of the Allport and Odbert 
trait list. The basis on which the 300 
adjectives in the ACL were derived 
from Cattell’s list of 171 trait vari- 
ables is not specified. In addition, 
the assumptions both of Allport and 
Odbert in their original derivation of 
the trait list and of Cattell in his 
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factorization are further restrictions 
in interpreting ACL scores. 

With such lists of traits or items 
it is necessary to assume either that 
they truly represent all self-percepts, 
or at least that they represent the 
most important ones. But, especially 
for the phenomenologist, must it not 
be assumed that these are different 
for different subjects and/or subject 
populations? Must not this list, then, 
be tailor-made to the subject to be 
truly representative for him (a totally 
idiographic procedure)? Perhaps 
what is required is that the subject 
generate his own list of self-descrip- 
tions, or a self-description, and the 
values he attaches to the separate 
elements and to the composite. 
Kelly’s (1955) Role-Construct Reper- 
tory Test appears to fit this model. 

It would seem possible to achieve 
some degree of representativeness in 
the sampling of a defined universe of 
self-reference items. The definition 
of the population is properly referable 
to the theory in which the self-ac- 
ceptance construct is embedded. 
That is, one should be able to deduce 
from the theory the nature of the 
items to be sampled (although, from 
a phenomenological theory, one might 
protest that this population of items 
is unique to the individual; but this 
only thickens the soup). Not only 
should the population of subjects be 
specifiable (for example, the theory 
has particular relevance to persons in 
client centered therapy), but what 
constitutes a relevant self-evaluative 
statement (that is, the basis for self- 
evaluation) should be deducible as 
well. The relative adequacy of theo- 
ries employing self-acceptance con- 
structs is clearly at issue in this case. 

With regard to the problem of 
sampling a defined universe, one 
approach has been suggested by 
Crowne (1959). Definitions of self- 
accepting and self-derogatory be- 
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havior from the point of view of 
social leay..i: ¢ theory (Rotter, 1954) 
were gir crst to psychologist 
judges an then to judges drawn from 
the subject population (introductory 
psychology students) to which gen- 
eralization was intended. The psy- 
chologists were asked to generate 
from these definitions lists of self- 
evaluative behaviors—that is, be- 
havioral referents, or cues, of self- 
acceptance and self-rejection—-com- 
mon in such a subject population. 
Subject judges were given a list of 
300 adjectives (actually, the ACL) 
and asked to rate each adjective in 
terms of the extent to which they 
felt that, if it were checked by one of 
their peers as descriptive of himself, 
self-acceptance or self-rejection would 
be indicated. Items were then -se- 
lected on the basis of high interjudge 
agreement of both psychologist and 
subject judges. In this way the 


items were tied to, and representa- 


tive of, both the superordinate theory 
and the specific population of self- 
evaluative behaviors common to the 
experimental population. This pro- 
cedure was still limited, however, to 
the extent that the list of 300 adjec- 
tives failed to represent some clearly 
defined universe. Generalizing the 
procedures used in this study, it would 
be possible to elicit descriptions of 
self-acceptance and self-rejection (the 
definitions for the judges being de- 
rived from theory) from a large 
sample of judges drawn from the 
appropriate population. Items might 
then be selected from descriptive 
units on which there was high inter- 
judge agreement. The methodologi- 
cal and psychometric considerations 
proposed by Jessor and Hammond 
(1957) would presumably dictate 
th: form of the scale, type of re- 
sponse, and related aspects of test 
construction. 


This section has been concerned 
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with two problems related to the 
definition of the parameters of self- 
acceptance: (a) the necessity of pro- 
viding a definition at the construct 
level, in which the behavioral refer- 
ents of self-acceptance are specified 
and the construct located in a nomo- 
logical net; and (6) the need to con- 
sider the representativeness of the 
sampling of a population, as defined 
in a of self-reference statements or 
items. Failure to meet these criteria 
results in the inability of the experi- 
menter or test constructor legiti- 
mately to generalize from the par- 
ticular conditions (subjects and stim- 
uli) of his experiment or test. 


SocriAL DESIRABILITY 


The third general issue to raise 
concerns the extent to which self- 
evaluative responses are influenced 
by “defensive behavior’ (Butler & 
Haigh, 1954; Zuckerman & Monash- 
kin, 1957), “‘self-protective response 
tendencies’”’ (Crowne, 1959), or ‘“‘so- 
cial desirability’’ (Edwards, 1957; 
Kenny, 1956). It is important, how- 
ever, first to consider whether these 
terms refer to the same or different 
phenomena. 

Butler and Haigh apply the term 
“defensive responding’’ to the re- 
sponses of those individuals who do 
not reveal the extent of their self- 
dissatisfaction and who, by other 
criteria, would be judged as malad- 
justed. (These authors thus seem to 
reject, for some subjects at least, the 
assumption of validity of self-reports, 
although how this can be done within 
a phenomenological frame of refer- 
ence is hard to understand.) ‘De- 
fensiveness’’ has been used by 
Zuckerman and Monashkin to refer 
to the phenomenon whereby “The 
person who is self-satisfied is likely 
to answer MMPI items in a way 
which he considers personally and 
socially desirable’ (p. 147). Crowne 
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used the term “‘self-protective be- 
havior” to refer to the unwillingness 
of some individuals to acknowl- 
edge self-dissatisfaction. These three 
terms, then, have been used to refer 
to highly similar phenomena. 
“Social desirability”’ as defined by 
Edwards (1957), however, refers pri- 
marily to the 
scale value for any personality statement such 
that the scale value indicates the position of 


the statement on the social desirability con- 
tinuum (p. 3). 


It also applies, as Edwards further 
points out, 


to the tendency of all subjects to attribute 
to themselves, in self-description, personality 
statements with socially desirable scale values 
and to reject those with socially undesirable 
scale values (p. vi). 


Whereas the above concepts of “‘de- 
fensiveness” have been applied to the 
motivation, presumably greatei for 
some subjects than for others, to con- 


ceal self-dissatisfaction, Edwards’ no- 
tion of “social desirability”’ refers to 
a characteristic of items—that is, 
their location on a continuum of 
social desirability, which determines 
the proportion of subjects who will 
attribute the characteristics to them- 
selves. 

Butler and Haigh, and also Zucker- 
men and Monashkin, conclude that 
subjects who are unwilling to attrib- 
ute undesirable characteristics to 
themselves or confess self-dissatis- 
faction are by that very fact malad- 
justed, and presumably therefore 
self-dissatisfied. This, however, is 
obviously an hypothesis for investiga- 
tion, and not necessarily true by 
definition. Self-acceptance tests do 
not directly indicate whether the 
subject is willing to express self- 
discontent, but only whether he does 
express it. Zuckerman and Monash- 
kin have also suggested, in fact, that 
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subjects giving more socially unde- 
sirable responses may have a differ- 
ent conception of what is socially 
desirable, and thus they implicitly 
suggest that these subjects may ac- 
tually not differ in terms of their need 
to respond in a socially desirable 
fashion. Such a difference in concep- 
tion of what is socially desirable 
might be expected to be associated 
with maladjustment, but it would 
certainly be a less direct indication 
of self-dissatisfaction per se. 

Four separate hypotheses could be 
advanced concerning the relationship 
between social desirability and re- 
sponses on self-acceptance (or any 
other self-report) tests. Each of these 
is capable in some degree of being 
tested. 

Hypothesis I. Social desirability 
has no effect on test responses. This 
is essentially the assumption of 
validity of self-reports: that what the 
subject says about himself is a valid 
and direct indication of what he feels 
or thinks, at least at the time, about 
himself. This, incidentally, seems to 
be a necessary assumption for phe- 
nomenologists, although it is a test- 
able proposition. 

Hypothesis II. Social desirability 
factors account for equal variance in 
all subjects’ test scores. This assump- 
tion is tenable from Edwards’ ap- 
proach and could be held even in the 
face of most of the research to be 
reported below. It posits, in effect, 
that once one has accounted for 
variance due to nomothetically de- 
termined social desirability in any 
subject’s test score, what is /eft indi- 
cates the subject’s true self-feelings. 

Hypothesis III. Social desirability, 
while it may or may not be an im- 
portant factor for all subjects, ac- 
counts for more of the variance for 
some subjects than for others. This 
corresponds to the suggested differ- 





SELF-ACCEPTANCE AND SELF-EVALUATIVE BEHAVIOR 


ences in need to perform in a socially 
desirable way, protect the self, and 
disguise self-discontent. It is inter- 
esting that such need has been sup- 
posed to be an important variable 
only for those who show relatively 
high self-acceptance or social de- 
sirability scores: the rebel, or the 
individual seeking succorance, may 
produce very /ow scores, as a result of 
a complementary need to perform in 
a socially undesirable way, and still 
not necessarily differ from others in 
terms of over-all adjustment or 
“true’’ self-acceptance. In any case, 
such a conception as this suggests 
research determining the correlates 
of this need to perform in a socially 
desirable, or to perform in a socially 
undesirable, way. 

Hypothesis IV. Variance associ- 
ated with a nomothetically deter- 
mined social desirability factor re- 
flects differences in the conception of 
what is socially desirable. This hy- 


pothesis is not necessarily in conflict 


with Hypothesis III: both factors 
could operate simultaneously, al- 
though separating the variance due 
to each might be quite difficult. This, 
as well as Hypothesis III, is definitely 
incompatible with Hypotheses I and 
II. 

With the above distinctions in 
mind, then, the results of some in- 
vestigations of the relationship of the 
social desirability variable to self- 
acceptance test scores can be ex- 
amined. Kenny (1956) gave 25 self- 
descriptions previously employed in 
a study by Zimmer (1954) to a group 
of judges for social desirability scal- 
ing. Three independent samples of 
subjects then responded to these 
items in the form of a questionnaire, 
a self-descriptive rating scale, and a 
Qsort. The correlations between the 
social desirability scale values and 
the scores obtained on the question- 
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naire, rating scale, and Q sort were 
.82, .81, and .66, respectively. The 
last two correlation coefficients are 
based on a “‘real self” scores. Social 
desirability correlated .82 with the 
“ideal self’’ rating scale score and .59 
with the “‘ideal’’ self Q sort. 

Edwards (1955, 1957) and Edwards 
and Horst (1953) have also shown 
that Q sorts are highly influenced by 
the social desirability variable. In a 
study reported in 1955 and reviewed 
in 1957, Edwards found correlations 
of .84 and .87 for males and females, 
respectively, between item place- 
ment on a Q sort and the social de- 
sirability scale values of the items. 
In this case, the items were those 
employed in the development of the 
Edwards Personal Preference Sched- 
ule (1953). 

In one study (Kogan, Quinn, Ax, & 
Ripley, 1957), a social desirability 
scale value-response correlation of 
.67 was found in a hospitalized psy- 
chiatric patient sample diagnosed as 
psychoneurotic. The correlation in a 
control group of male college students 
was .85. It is interesting to speculate 
upon the possible significance of the 
difference in the magnitude of the 
correlation between self-description 
and social desirability values found 
for the patient and nonpatient groups. 
Perhaps, as Hypothesis IV proposes, 
maladjusted persons have different 
conceptions of social desirability in 
self-evaluative situations. 

Studies by Berger (1955), Block 
and Thomas (1955), and Zuckerman 
and Monashkin (1957) are also rele- 
vant to the problem of social desir- 
ability. These studies investigated 
the relationships between self-accept- 
ance and the clinical and ‘“‘validity” 
scales of the MMPI. Employing 
different subject populations—col- 
lege undergraduate students in the 
first two studies and hospitalized 
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psychiatric patients in the latter 
investigation—and different meas- 
ures of self-acceptance, there was 
nevertheless considerable agreement 
in the findings. Self-acceptance was 
found to be significantly negatively 
correlated with a number of the 
clinical “‘adjustment”’ scales and posi- 
tively correlated (r’s ranging from 
.33 to .58) with the K scale, inter- 
preted as a measure of test-taking 
defensiveness (McKinley, Hathaway, 
& Meehl, 1948). Zuckerman and 
Monashkin took their findings to 
mean that “both self-acceptance and 
MMPI scales are probably being 
influenced more by the common trait 
of defensiveness than by actual ad- 
justment” (p. 147). The term ‘‘de- 


fensiveness,”’ with its connotation of 
maladjustment, seems less applicable 
here than “social desirability,’ espe- 
cially in view of the high correlation 
(.81) reported by Edwards (1957) 
between the K scale and his Social 
Desirability Scale. 


With approxi- 
mately 65% of the variance accounted 
for in the covariation of these two 
scales, the results of the three studies 
cited above would seem to be a func- 
tion of the common denominator of 
social desirability. Thus, the items 
on the self-acceptance tests used and 
those on the MMPI are highly re- 
lated to the scale values on Edwards’ 
Social Desirability Scale. 

In the study referred to earlier, 
Cowen and Tongas found a correla- 
tion of .91 between social desirability 
ratings and the self-concept score of 
the IAV. A correlation of .96 was 
obtained between social desirability 
ratings and the ideal-self score on the 
IAV. The latter correlation might 
be taken to suggest culturally stereo- 
typed conceptions of what one ought 
to be that would be consistent with 
Hypothesis IV above. In another 
investigation (Nebergall, Angelino, & 
Young, 1959), it was found that sub- 
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jects who reported high self-accept- 
ance tended to disagree with group 
judgments of adjustment. For most 
subjects, in fact, self-acceptance rat- 
ings were higher than group ratings. 
Again, these findings may be under- 
stood in terms of the individual’s 
need to present himself in what he 
regards as a culturally sanctioned 
manner. 

While this discussion has been 
concerned primarily with the social 
desirability factor in self-acceptance 
tests, it seems highly probable that 
any self-report device will be affected 
by the social desirability of items or 
of available responses. Failure to 
control for the effects of this variable 
by one of several available procedures 
(Edwards, 1957) means, in effect, 
that the test in question may better 
be interpreted as a measure of social 
desirability (that is, the subject’s 
conception of social desirability or 
need to perform according to it) than 
of self-acceptance. This can be il- 
lustrated by means of an hypothetical 
experiment. It might be hypothe- 
sized that need-determined percep- 
tual behavior—for example, percep- 
tual reactivity to threat—is related 
to self-acceptance (cf. Cowen et al., 
1957). Failure to control for social 
desirability in the self-acceptance 
assessment operations would make 
the results, no matter what the out- 
come, uninterpretable in terms of self- 
acceptance. In the light of what is 
already known about the influence 
of social desirability on self-report 
devices, the most probable interpre- 
tation of such an experiment would 
be that perceptual reactivity to 
threat is related (or unrelated) to the 
socially desirable responding of sub- 
jects—that is, their need to be per- 
ceived in a particular way or their 
conception of how they want to be 
perceived. Not provided in this ex- 
periment are the operations for deter- 
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mining the relationship between 
perceptual reactivity and “‘real’’ self- 
acceptance. 

While studies of the effect of the 
social desirability variable on many 
of the commonly employed tests of 
self-acceptance have not been done, 
the results of the investigations dis- 
cussed above would suggest that self- 
evaluative tests are particularly sus- 
ceptible to criticism on social desir- 
ability grounds. A common denomi- 
nator in research findings on self- 
acceptance may well be the variable 
of social desirability. Edwards (1957) 
and Jackson and Bloomberg (1958) 
have made a similar analysis with 
respect to the Taylor anxiety scale 
(Taylor, 1953). Systematic investi- 
gation of both the parameters and 
the effects on test behavior of social 
desirability would clearly seem to be 
in order. That self-acceptance tests 


are influenced by factors other than 
the manifest content of the items, 
however, seems beyond dispute. 


THE GENERALITY OF SELF- 
ACCEPTANCE 

To this point the issues discussed 
have been pertinent strictly to psy- 
chometric and methodological prob- 
lems in assessing self-acceptance. A 
further issue to be raised, although it 
certainly has methodological rami- 
fications, is the primarily theoretical 
question of the generality of self- 
acceptance. 

Generality involves two related 
problems, one empirical and the other 
a theoretical problem of interpreta- 
tion. Empirically, there is need of 
evidence concerning the temporal 
stability of self-acceptance; the con- 
sistency of an individual’s self-ac- 
ceptance from one situation to an- 
other (for example, in friendly vs. 
hostile groups, or where self-efface- 
ment is rewarded or not rewarded); 
the generality of self-acceptance in 
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reference to different aspects of the 
“self’’ (for example, in reference to 
morality vs. in reference to interper- 
sonal effectiveness); and agreement 
of different kinds of manifestation of 
self-acceptance (for example, spon- 
taneous self-appraisal vs. that mani- 
fested in an undisguised test such as 
the ACL vs. inferences drawn from a 
TAT protocol). The theoretical 
question is simply how best to con- 
strue self-acceptance. If, as has been 
suggested (Rogers, 1951), the self- 
concept and self-acceptance can be 
considered to be relatively stable 
characteristics of a person, one should 
find that situational variables have 
only a negligible effect on self-accept- 
ance, that measures of self-accept- 
ance taken in different social contexts 
are highly correlated, and that meas- 
ures taken over temporal intervals 
are likewise highly stable. If these 
questions can be answered positively, 
it would be reasonable to construe 
the self-concept, from which the dis- 
crepancy notion of self-acceptance is 
derived, as a meaningful variable on 
which there are consistent differences 
between subjects, and it would be 
highly appropriate to think of in- 
dividuals in terms of their character- 
istic levels of self-acceptance. To the 
degree that self-acceptance is a func- 
tion of variables associated with 
specific situations or types of situa- 
tions, however, it will be more fruit- 
ful to investigate self-evaluative be- 
havior per se and its situational de- 
terminants. 

The empirical evidence with re- 
spect to the generality of self-accept- 
ance is rather scanty. The fact that 
studies have not attacked this ques- 
tion may be attributable to the gen- 
eral assumption that self-acceptance 
is consistent. Three investigations 
have been reported which do bear on 
this question. With respect to tem- 
poral stability, Taylor (1955) reports 
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a test-retest correlation of .79 (pre- 
sumably based on_ self-sort—ideal- 
sort discrepancy scores) over an 
interval of a week. Butler and Haigh 
(1954) report the correlations be- 
tween self-sorts and ideal-sorts for 
each subject in a control group 
(N=16) not receiving therapy for 
two Q sort administrations separated 
by a considerable period of time. 
Although consistency was apparent, 
Butler and Haigh noted that 

there are some sharp individual changes which 
indicate that alteration in self-ideal congru- 


ence does occur at times in the absence of 
therapy (p. 67). 


Concerning the influence of situa- 
tional variables of self-acceptance, a 
study by Thorne (1954) is relevant. 
Employing the IAV, Thorne found 
that following induced failure on a 
mirror drawing task, subjects whose 
initial level of self-acceptance was 
high tended to lower their self-ratings 
in the direction of a more realistic 
eva‘uation, while originally low self- 
accepting subjects tended to increase 
self-acceptance scores and showed 
concern over loss of self-esteem. The 
results of this study would suggest 
that self-acceptance is influenced by 
environmental events and that per- 
sons respond self-reflexively to per- 
ceived successes and failures. 

It would appear, from this brief 
discussion, that studies should be 
devoted to the problem of the gen- 
erality of self-evaluative behavior. 
Of particular interest are the ques- 
tions of temporal stability, influence 
of situational variables, and the 
effect on self-evaluation of such fac- 
tors as success, failure, and punish- 
ment. 


SELF-ACCEPTANCE Vs. SELF- 
EVALUATIVE BEHAVIOR 


It has been necessary at several 
points in this discussion to point out 
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differences between a phenomeno- 
logical and a behavioristic approach 
to self-acceptance. Since these differ- 
ences are basic to the research ap- 
proaches—not to mention the way in 
which such research is construed— 
in this general area of inquiry, and 
since these differences seem not to 
have been fully appreciated by all 
who have written on the topic, some 
further discussion of them is in order. 

A phenomenological approach to 
self-acceptance is concerned with 
self-acceptance itself, or ‘‘real’’ self- 
acceptance, as a totally private, sub- 
jective experience of the subject. By 
definition this is never observable by 
any other; the best that an experi- 
menter or clinician can hope to do is 
make relatively accurate guesses, or 
inferences, concerning the existence, 
or degree, of the variable as it “exists” 
in the subject. By such a definition, 
self-acceptance corresponds to Mac- 
Corquodale and Meehl’s (1948) early 
conception of an “hypothetical con- 
struct’’—something which cannot be 
observed but still is assumed to 
exist—except that there is little 
suggestion that self-acceptance even 
can be observed by anyone other than 
the subject himself. It is only with 
some difficulty, it would seem, that a 
phenomenologist can avoid the neces- 
sity of assuming the validity of self- 
reports. Representative sampling, 
and also an idiographic procedure for 
determining what are the most salient 
aspects of a subject’s self-evaluation, 
would seem to be most important in a 
phenomenological approach to the 
assessment of self-acceptance. Social 
desirability, on the other hand, 
should be assumed not to be a factor 
in self-reports. To assume a high de- 
gree of generality or consistency— 
temporal, situational, etc.—is not 
necessarily essential to a phenomeno- 
logical approach; however, in any 
theory which posits generalized self- 
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acceptance as an important dimen- 
sion on which to compare people, 
empirically determined generality of 
the variable is, naturally, crucial. 

A behavioristic concern with self- 
acceptance might more clearly be 
directed toward “‘self-evaluative be- 
havior,’’ on the other hand. The ad- 
ditional inference of some underlying, 
real if unobservable, phenomenologi- 
cal state is not essential to a study of 
self-evaluative behavior per se; and it 
might be pointed out that self-evalua- 
tive behavior is an interesting and 
perhaps important focus of interest 
in and of itself. In such an approach, 
the assumption of validity of self- 
reports is clearly not essential; a 
clear construct-level definition of 
self-evaluative behavior, on the other 
hand, is. Generality, representative 
sampling in test construction, and the 
related question of equivalence of 
assessment operations are crucial 
questions only if the goal is to ap- 
proach self-evaluative behavior as a 
trait, or consistent behavioral tend- 
ency, by which to classify people in a 
generalized fashion. It is quite feas- 
ible to examine self-evaluative be- 
havior as a situationally determined 
phenomenon, or as one determined 
by a situation-person interaction, 
rather than as a trait. Social desir- 
ability, defensiveness, etc., become 
merely other variables related (or 
unrelated) toself-evaluative behavior, 
and not components of error vari- 
ance. And, most important, it 
becomes an empirical matter to de- 
termine correlates (such as ‘“‘adjust- 
ment’’) of various forms of self-eval- 
uative behavior, either in general or 
in specified contexts. 

This discussion is not meant to 
imply that a phenomenological inter- 
est in self-acceptance is unsophisti- 
cated or unworthy. Theoretical 
understanding of phenomenal states 
is a problem of inference. A clearer 
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conception of ‘internal’ phenomenal 
states such as self-acceptance would 
seem to be best derived from the ob- 
servable behaviors of the person— 
that is, his self-evaluative behaviors. 
Phenomenological research would ap- 
pear, in fact, to involve complexities 
that do not attach to more behavior- 
istic efforts. 


SUMMARY AND CONCLUSIONS 


‘‘Self-acceptance”’ promises to be- 
come an increasingly attractive focus 
of interest in both formal and infor- 
mal psychological theory. A con- 
siderable volume of research has al- 
ready been devoted to the topic, and 
a sizeable number of tests devised for 
such research. To this date, however, 
research has contributed an unknown, 
but perhaps very small, amount of 
understanding of self-acceptance and 
its relationships to other personality 
variables. The failures of self-accept- 
ance research can be traced, at least 
in large part, to neglect of several 
crucial psychometric and methodo- 
logical principles: the unsupported 
assumption of equivalence of assess- 
ment procedures, the absence of any 
clear construct-level definition of the 
variable, failure to construct tests in 
accord with principles of representa- 
tive sampling, and questions concern- 
ing the social desirability factor in 
self-report tests. In addition, the 


absence of data concerning the gen- 
erality of self-acceptance makes re- 
search results even more difficult to 
interpret; and the implications of the 
difference between a phenomenologi- 
cal approach to self-acceptance and a 
behaviorjstic approach to “‘self-eval- 


uative behavior’ have not been 
clearly understood. 

The relative absence of systematic 
efforts in test development, standard- 
ization, and validation in this area is 
perhaps due to the fact that the focus 


of self-acceptance research to date has 
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been chiefly on the preliminary test- 
ing of hypotheses, rather than the 
development of adequate tests as a 
primary aim. A test designed solely 
for the purpose of testing one or two 
hypotheses does not, it might be 
argued, require so much care as a test 
designed to serve as a standardized 
instrument for many purposes. In- 
deed, such an argument would con- 
tinue, this care and time are not 
usually appropriate for such re- 


stricted purposes. (The development, 
use, and subsequent misuse of the 
Taylor Manifest Anxiety Scale would 
serve as a case in point as Taylor 
herself, 1956, has pointed out.) 


But 
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when such tests are then used in fur- 
ther research as if they had been care- 
fully and adequately constructed, 
little can ensue but error and confu- 
sion. And such seems to be the case 
in self-acceptance research. 

Perhaps it is true that these tests 
are not yet used commonly in clinical 
settings where their inadequacies 
could lead to disservice to the client; 
perhaps it is true that the tests are 
used for very little other than re- 
search. But this only makes rigorous 
test construction the more important 
if research in such a complex area is to 
produce dependable and unambig- 
uous results. 
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It is the purpose of this article to 
review methods which have been sug- 
gested, either directly or indirectly, 
for the construction of unidimen- 
sional tests. No general survey of this 
topic appears to have been made 
previously but much help was ob- 
tained from critiques by Loevinger 
(1948), Guttman (1950a, 1950b,1950c), 
and White and Saltz (1957). 

Definition of unidimensional tests. 
A unidimensional test may be defined 
simply as a test in which all items are 
measuring the same thing. A set of 
high jumps or a set of broad jumps is 
unidimensional. A mixture of high 
jumps and broad jumps is not. In 
psychological tests, however, items 
which appear to be of the same sort 
often turn out on closer investigation 
to be measuring different things so 
that this simple definition will not 
suffice for the construction of tests. 

A more precise definition is given 
by considering the answer pattern 
that would be yielded by a unidimen- 
sional test with infallible items. If 
the items are arranged in order of 
difficulty placing the easiest first it 
will be found that a person who fails 
the first will fail all the other items; a 
person who passes the first and fails 
the second will fail all the subsequent 
items and so on. That is, the pattern 
of responses for five items could only 
be one of the forms shown in Table 1. 

With fallible items where the result 
may be affected by fluctuations in 
the ability of the subjects or in the 
difficulty of the items a perfect answer 
pattern may not be found even when 
the items do systematically measure 
the same thing. For our purposes, 
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TABLE 1 


PATTERNS OF RESPONSES FOR A UNIDIMEN- 
SIONAL TEST OF FIVE INFALLIBLE ITEMS 








Total Item 


Scores 
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*P=Pass F =Fail. 


however, it is sufficient to take the 
answer pattern of Table.1 as provid- 
ing a working definition of unidimen- 
sionality remembering that with falli- 
ble items the answer pattern will be 
disturbed by random error. 

Criteria for evaluation. In evaluat- 
ing the methods, major consideration 
will be given to the extent to which a 
method provides for: 

1. A rational procedure for item 
selection 

2. Acriterion of unidimensionality 

3. An index or measure of unidi- 
mensionality 

A rational procedure for item selec- 
tion is essential. Any method which 
provides no adequate indication of 
the most likely items to be discarded 
from the pool and which relies on a 
blind trial-and-error procedure to 
discover the unidimensional set of 
items will be hopelessly uneconomical 
for practical test construction. In 
general the method should be con- 
vergent so that the homogeneity of 
the item set increases as the proce- 
dure is applied and items are pro- 
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gressively removed from the original 
pool. Minor departures from this 
principle at critical stages (usually 
the beginning) are permissible so long 
as the number of trials to reach a con- 
vergent state of affairs is not too 
large. 

A criterion of unidimensionality is 
necessary so that checks can be made 
from time to time and decisions made 
either to continue culling of the item 
pool or to stop culling because a 
homogeneous set of items has been 
obtained. 

An index of the closeness of ap- 
proximation to unidimensionality is 
also required. Failure to find a set of 
items which meets a strict criterion 
of unidimensionality is certainly pos- 
sible and, indeed, very likely. The 
set of items in question may, how- 
ever, be more unidimensional than 
any other measuring instruments 


available and would be preferable to 
a completely heterogeneous set of 
items alleged to measure the same 


attribute. The index of unidimen- 
sionality may be related to the pro- 
cedure for selecting items and/or to 
the proposed criterion of unidimen- 
sionality, or, on the other hand, may 
be quite independent of either of 
these. It would be desirable for the 
sampling distribution of the index of 
unidimensionality to be known. 

It should be noted that the index 
of unidimensionality is not quite the 
same as the measures of reproduci- 
bility discussed by White and Saltz 
(1957). Reproducibility as defined 
by White and Saltz confounds relia- 
bility and dimensionality since the 
measures are affected by random 
errors as well as by systematic differ- 
ences in item content. An index of 
unidimensionality appropriate to the 
definition used here should be inde- 
pendent of random error. 


Methods to be reviewed. Explicit 
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consideration will be given only to 
classical item analysis, Loevinger’s 
technic of homogeneous tests, the 
independence criterion method, Gutt- 
man’s answer pattern method, and 
factor analysis. Most other methods 
are special cases of one or other of 
the listed methods and for the pur- 
pose of this review it is unnecessary 
to consider them. For example, 
criticisms of the Guttman procedure 
will apply also to the Cornell tech- 
nique (Guttman, 1947a) and H tech- 
nique (Stouffer, Borgatta, Hays, & 
Henry, 1952). Certain related tech- 
niques such as the Thurstone attitude 
scaling methods give tests of uni- 
dimensionality as a by-product but 
as test construction methods they 
are subject to the same criticisms as 
classical item analysis and the inde- 
pendence criterion. 


CLASSICAL ITEM ANALYSIS 


Classical item analysis using an 
internal criterion attempts among 
other things to increase the average 
item-test correlation by selecting 
from the item pool those items which 
have the highest item-test correla- 
tion. It is well-known that this pro- 
cedure tends to increase the homo- 
geneity of the test. 

From Table 1 it will be clear that 
for infallible items forming a unidi- 
mensional test the item-test correla- 
tion will be the maximum permitted 
by the snape of the distribution of 
test scores. With the answer pattern 
of Table 1 there is no overlap in the 
distribution of test scores for those 
who pass and those who fail a given 
item. The difference in mean test 
scores of passers and failers is thus a 
maximum and the biserial correla- 
tion between item and test is conse- 
quently maximized. It would appear 
then that if the culling of items pro- 
ceeds to the point where the item-test 





124 


correlations are all maximized the 
resulting test would be unidimen- 
sional. There are a number of diffi- 
culties which make this program un- 
likely to succeed. 

With fallible items the maximum 
item-test biserial will not be reached. 
One solution would be to correct the 
obtained biserials for attenuation 
using estimates of the reliability of 
item and test scores. Accurate esti- 
mates of the reliability of a single 
item are not easily obtainable. As- 
suming that this difficulty can be 
overcome a test would be regarded as 
unidimensional if the biserial correla- 
tions between item and test ap- 
proached the maximum after correc- 
tion for attenuation. 

Even granted the assumption that 
accurate estimates of item reliabilities 
can be obtained the method is not 
satisfactory. Consider the set of 
items with factor constitutions as 
follows: 


x1=mat+nb+e, 
x2=ma-+nb-+ eo 
x3=ma-+nb-+ pct es 
x4=ma+nb+qct+e, 
X5=ma+nb+rc+es5 


where a, b, and ¢ represent different 
orthogonal common factors, m, n, 
and p, q, r are loadings; and @,, és, és, 
é4, and é, are error factors. 

Lumsden (1957) has shown that 
Items 1 and 2 form a unidimensional 
subtest and that Items 3, 4, and 5 
with differing loadings on ¢ are not 
unidimensional. Yet the method of 
maximizing item-test correlations will 
eliminate Items 1 and 2 first and no 
unidimensional test will be discovered. 
The only way out of this impasse 
would be to try sets at random which 
would make the procedure nonra- 
tional. 
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For this method the criterion of 
unidimensionality would be maxi- 
mum biserial after correction for at- 
tenuation. No sampling distribution 
of corrected biserials appears to be 
available so that the significance of 
departures from the perfect fit cannot 
be assessed. This is specially impor- 
tant in this case since the estimates of 
item reliability on which correction 
is based are likely themselves to be 
quite unreliable. 

The logical measure of unidimen- 
sionality would be average corrected 
biserial. This would need to be con- 
sidered relative to the maximum ob- 
tainable biserial (biserial r has a maxi- 
mum of 1.0 only when the continuous 
variable is normally distributed). A 
ratio of corrected biserial to its maxi- 
mum similar to Loevinger’s H; sug- 
gests itself but the absence of a knowl- 
edge of its sampling distribution 
would restrict its value. 

An obvious possibility would be to 
use the Kuder-Richardson Formula 
20 with correction for variation in 
item difficulty suggested by Horst 
(1953). This statistic is, however, 
affected by random as well as sys- 
tematic variance and is therefore, a 
measure of reproducibility rather 
than an index of unidimensionality. 
There would seem nothing to prevent 
the development of an index based 
on the ratio of obtained K-R 20 to 
the maximum K-R 20 for items with 
a given amount of random error.! 

A search of the literature has not 
revealed any writer who has advo- 
cated the use of classical item analysis 
techniques as described above in 
order to produce unidimensional tests. 
Thorndike attempted to demonstrate 
the “homogeneity of intellect CAVD” 
by correlating scores on subgroups of 


1 IT am indebted to John Ross (University of 
Sydney) for this suggestion. 
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items with scores on the total set of 
items and correcting the obtained 
r’s for attenuation. Evidence was 
presented (Thorndike, Bergman, 
Cobb, Woodyard, 1926, p. 566) that 
these corrected correlations approxi- 
mated 1.0 and Thorndike concluded 
that this demonstrated the homo- 
geneity of CAVD tests. The logic of 
Thorndike’s procedure is impeccable 
if applied to single items or to ran- 
domly selected subgroups of items 
but his subgroups were arranged so 
as to have, like the total set, equal 
numbers of Completion, Arithmetic, 
Vocabulary, and Directions items. 
Thorndike was thus merely able to 
show that the composite score ob- 
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LOEVINGER’sS TECHNIC OF Homo- 
GENEOUS TESTS 

Loevinger’s procedure is closely 
related to classical item analysis and 
indeed she indicates (Loevinger 1947, 
p. 26) that the earlier work by Thorn- 
dike on the CAVD tests may have 
been influential in the development 
of her procedure. 

The procedure is based on two 
statistics: the “homogeneity of an 
item with a test’’ and the “homo- 
geneity of a test.’’ The first of these 
is to be used as a tool for item selec- 
tion and is a development of Long’s 
(1934) index of overlapping. The 
formula for this is given by Loevinger 


as: 


2 (“‘passes”’ below or tied with ‘‘fails’’) 


H.=1-——— 


PQ —‘‘passes” one above “fails” 


tained from his subsets was similar 
to the total score obtained from the 
complete set but not that the subsets 
or the total set were homogeneous in 
the sense used here. It is only fair to 
point out that Thorndike was mainly 
concerned to show that his easier 
sets of items and his harder sets gave 
the same sort of results as the total 
set. 

Wherry and Gaylord (1943) sug- 
gest as an alternative to factor analy- 
sis an iterative procedure based on 
classical item analysis. In this proce- 
dure each item is correlated with 
total score; those items with the 
highest correlations are selected and 
a new total formed; all items (includ- 
ing those not selected in the first 
stage) are then correlated with the 
new total and the procedure is con- 
tinued until a stable group of items 
is obtained. White and Saltz (1957) 
commend this method but it would 
not appear to avoid any of the diffi- 
culties of classical item analysis. 


where P is the number passing the 
item and Q is the number failing the 
item. It is clear that for a perfectly 
unidimensional test as defined by 
Table 1 H,, will equal 1.0 since there 
will be no subjects who pass an item 
who will have scores below or tied 
with subjects who fail the item. Us- 
ing this statistic to cull a mixed set of 
items will, however, be subject to all 
the difficulties encountered with clas- 
sical item analysis. 

The index of unidimensionality is 
provided by the “homogeneity of a 
test,” H,. Loevinger notes that for a 
perfectly heterogeneous test pi;;=); 
(i.e., probability of passing an Item 4 
having passed another Item 7 is the 
same as the overall probability of 
passing Item). Fora perfectly homo- 
geneous test as defined by Table 1, 
ii;=1.0 for all cases where p;>); 
(i.e., where Item 7 is easier than Item 
j). From this it will be seen that 
p.; has a minimum value of ;, for all 
cases where p;> pj. 
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Loevinger then considers the sum: 


m—1 m 
S= DOD pilbii-p 

i=] jusi+1 
where m is the number of items and 
the item pairs are all such that 
Pim Pi | 

This sum will have a maximum 

value given by 


E eee 


t=] jei+l 


Smax = 


for a perfectly homogeneous test and 
a value of zero for a perfectly hetero- 
geneous test. To provide an index 


with the formal properties of a mini- 
mum of zero and a maximum of 1.0, 
Loevinger divides S by Snax to give: 


bs Sateen 


Bases: jat+l 


- Ss pi(1— 


t=1 je=i+1 


Loevinger provides a formula for 
estimating H, from sample statistics 
but points out that the sampling 
distribution is unknown and that the 
estimate is not even known to be 
unbiased. 


CRITERION 
Lazarsfeld (1950), Tucker (1952), 
and Lord (1952) have pointed out 
that with a unidimensional test the 
probability of success on one item is 
independent of success in any other 
item for subjects with the same true 
score. This is at first sight paradoxi- 
cal because it would seem obvious 
that items which are measuring the 
same thing should be highly corre- 
lated. But when only subjects of the 
same true ability are considered then 
items which are measuring this abil- 
ity and nothing else can differ only 
through error and will exhibit no 
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systematic variance. If we take sub- 
jects who are exactly 6 feet tall then 
different measures of height will vary 
only through error so that the meas- 
urements will be independent, uncor- 
related. The independence criterion 
is undoubtedly valid and is more gen- 
eral than any other. It makes no 
assumptions about the distribution 
of ability or rectilinearity of regres- 
sion. 

The criterion suggests a procedure 
for constructing unidimensional tests. 
It would be possible to obtain results 
on a pool of items from a large group 
of subjects, to choose a number of 
subjects with the same total score, 
and then to determine say by x? 
whether the items are independent 
or not. If certain items turned out 
not to be independent these could be 
rejected, new totals worked for all 
subjects in the original group, a new 
group with the same total score de- 
termined, and the x? test repeated. 

The true scores on the test are not, 
however, known and the estimates 
obtained from the raw test scores are 
not satisfactory. O’Neil (1954) has 
shown that, for subjects with the 
same obtained score, items, even if 
unidimensional, tend not to be inde- 
pendent but to be negatively corre- 
lated. If there are only two items for 
example then for subjects with an 
obtained score of 1 the items have a 
tetrachoric correlation of —1.0 since 
if a subject has the first item right he 
must have the second one wrong and 
vice versa. In mathematical terms 
bijj=0 instead of p; as required by 
the independence criterion. This 
effect is known to decrease as the 
number of items is increased and it is 
possible that the independence crite- 
rion may be workable for fairly large 
groups of items. With infallible items 
there is, of course, no problem since 
true scores will then equal the ob- 





CONSTRUCTION OF UNIDIMENSIONAL TESTS 


tained scores and the quoted example 
could not occur (if the items were 
unidimensional). 

Even if this problem is overcome, 
the culling of items is likely to prove 
arduous. All items in the pool are 
likely to be correlated on the first 
trial. In the absence of any knowl- 
edge about the number of items in 
the unidimensional set it is impossible 
to say whether the unidimensional 
items will be more or less intercorre- 
lated than the items it is desired to 
reject. No rational, convergent pro- 
cedure of item culling is available 
using the independence criterion. 

No special index of unidimension- 
ality is suggested for this method. 
This, of course does not matter, since 
if the method was otherwise suitable 
an index could be borrowed from one 
of the other methods. 


ANSWER PATTERN METHODS 


The Guttman procedure (1944) is 
the most important of the answer 


pattern methods and is the only one 
discussed here. Some earlier writings 
by Walker (1931, 1936, 1940) and 
Ferguson (1941) have the first ex- 
plicit discussions of the relationship 
between answer pattern and other 


test characteristics but no sugges- 
tions for test construction were made. 

The answer pattern procedure con- 
sists essentially of inspecting the 
answer pattern and removing items 
so that the remaining items have 
patterns which are as near as possible 
to those of Table 1. It is clear that 
for infallible items this procedure 
could be easily carried out and that a 
simple inspection of the answer pat- 
terns would provide a clearcut crite- 
rion of unidimensionality. For items 
which exhibit slight departures from 
unidimensionality the procedure 
would be to eliminate items until the 
closest possible approximation con- 
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sistent with retaining a sufficient 
number of items was obtained. For 
this, some measure of the closeness of 
approximation to unidimensiouality 
is required. Guttman uses the co- 
efficient of reproducibility which is 
the proportion of responses which can 
be correctly predicted from the total 
raw score. For a perfectly unidimen- 
sional test it will be seen from Table 1 
that the reproducibility coefficient 
will have the value 1.0. Guttman 
suggests that a test may be regarded 
as a ‘‘scale”’ (i.e., as unidimensional) 
if the coefficient of reproducibility 
exceeds .90. 

The coefficient of reproducibility 
has been criticised severely by Fest- 
inger (1947) and Jackson (1949) be- 
cause it does not allow for the chances 
of obtaining high values when the 
items are heterogeneous (e.g., with 
only a few items of widely differing 
difficulties). Guttman (1947b) re- 
plied to criticism claiming that such 
factors as the number of answer cate- 
gories and the range of difficulty were 
taken into account before calculating 
the coefficient of reproducibility. 
Guttman does not give explicit rules 
but improvements to the reproduci- 
bility coefficient have been suggested 
by Jackson (1949) and Green (1954) 
which overcome some of the problems. 

The reproducibility coefficient, 
however modified, does not permit of 
a distinction being made between 
random and systematic scale dis- 
crepancies. Guttman claims (1950a, 
1950b, 1950c) that the distinction 
may be made by examining the pat- 
terns of scale discrepancies and pre- 
sents tables (p. 161) which purport to 
represent scale patterns for a perfect 
scale, a scale with random error, and 
a scale with systematic error. Evi- 
dence for random error in an item is 
said to be provided when scale errors 
are distributed randomly around the 
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cutting point for the item; evidence 
for systematic error when the scale 
errors are grouped in a systematic 
fashion. While this claim is undoubt- 
edly correct (such systematic group- 
ings are the basis of all the statistical 
analyses proposed for the problem) it 
is difficult to see how these groupings 
may be discovered by inspection and 
distinguished from random errors 
when the random errors are fairly 
large. 

Guttman (1950b) has explicitly de- 
nied any intention to use scale anal- 
ysis for the selection of items. His 
scalogram was designed merely to 
discover approximate cutting points 
for attitude scale items. Guttman in- 
deed claims that the task of scale 
analysis is to discover scales rather 
than to construct them and states 
that if a universe of attributes is 
scalable then any subset of items 
from that universe is scalable. Item 
culling is by this argument unneces- 
sary. The difficulty is that a test 
constructor (or discoverer) does not 
know precisely what ‘‘universe of 
attributes” he is sampling. Without 
precise definition he may sample a 
number of related universes. Item 
culling procedures are designed to 
distinguish between groups of items 
selected from different universes. 

It may be seen then that the an- 
swer pattern method provides no 
rational culling plan for use with 
fallible items. The index of unidi- 
mensionality provided by the plan is 
the coefficient of reproducibility 
which, despite improvements on the 
early Guttman form, does not dis- 
tinguish between systematic and 
random error. 


FAcToR ANALYSIS 


It is difficult to give due credit to 
whoever first suggested the use of 
factor analysis in the construction of 


JAMES LUMSDEN 


unidimensional tests. The idea is 
sufficiently obvious to be thought at 
least implicit in the writings of 
Spearman, Thurstone (1947), and 
other early factorists. The factor 
analyses of test items by McNemar 
(1942), Burt and John (1943), and 
others clearly suggest it. Papers on 
related topics by Ferguson (1941), 
Wherry and Gaylord (1943, 1944), 
Carroll (1945), and Loevinger (1948) 
discuss with varying degrees of com- 
pleteness the possibility of factor 
analyzing items in test construction. 

Under restrictions which appear 
plausible for ability test items it is 
easy to show (vide Lumsden, 1957) 
that for a unidimensional test the 
matrix of tetrachoric item intercorre- 
lations is of unit rank. One factor 
analytic procedure for constructing 
unidimensional tests is to extract a 
single factor from the item intercor- 
relations, cull out the items which 
have large residuals, reanalyze, and 
continue until a satisfactory fit to a 
single factor solution is obtained. 
Wolfle (1940) in a well-known jibe at 
Brown and Stephenson (1933) said: 
“if one removes all tetrad differences 
which do not satisfy the criterion, 
the remaining ones do satisfy it’’ (p. 
9). That is exactly what is done in 
this factor analytic technique of con- 
structing unidimensional tests. The 
difference between the two situations 
is, of course, that Brown and Stephen- 
son had asserted that their tests, all 
of them, would meet the tetrad differ- 
ence criterion, while here it is merely 
hoped that a subset of items will meet 
the criterion. 

The procedure is quite simple. But 
is the culling procedure rational? 
Will the set of items converge to uni- 
dimensionality? 

It is evident that convergence of 
the factor analytic procedure to a 
unidimensional subset of items can- 
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not be guaranteed. If the unidimen- 
sional set is much less numerous than 
the heterogeneous items in the pool 
then it is probable that the unidi- 
mensional set will not have sufficient 
influence on the nature of the first 
factor extracted to prevent the oc- 
currence of large residuals among the 
unidimensional set. These items will 
be discarded first and the procedure 
will not converge to a single factor 
solution. 

If, however, the items are carefully 
preselected on empirical and a priori 
grounds, it seems likely that the state 
of affairs of the preceding paragraph 
will not occur. If items are deliber- 
ately made parallel or if there is evi- 
dence for parallelism then it would 
follow that the dimension of any 
unidimensional test and the dimen- 
sions of the heterogeneous items in 
the total pool, will normally be highly 
correlated. In this circumstance the 
influence of the unidimensional set on 
the first factor extracted may well be 
greater than the actual numbers of 
items suggest, and the method may 
therefore be expected to converge. 
The procedure of preselecting will 
also tend to increase the size of the 
unidimensional set in the pool and 
this will also increase the probability 
of convergence. 

Lumsden (1959) found that four 
subsets of number series items se- 
lected on a priori grounds converged 
rapidly and that three of them met a 
fairly stringent test of unidimen- 
sionality when cross-validated with a 
fresh group of subjects. 

One procedure that should almost 
guarantee convergence (if a sizable 
unidimensional set exists) is to carry 
out a preliminary complete centroid 
analysis and then to select for further 
analysis those items which appear in 
narrow strips (i.e., roughly co-linear) 
in the factor space. This appears to 
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be the procedure advocated by 
Cattell (1957) for his factor homo- 
geneous scale except that he would 
require the additional restriction that 
the factor have significance in a more 
general factor space than that pro- 
vided by the item intercorrelations. 
The complete centroid procedure with 
rotation could indeed be used with- 
out further analysis except that the 
problems of estimating communali- 
ties and determining goodness of fit 
are more complicated than for the 
unit rank case. 

The criterion of unidimensionality 
suggested for item culling is the size 
of the residuals. This must be con- 
sidered with relation to the sampling 
distribution of residuals. Unfortu- 
nately there is no exact solution to 
this problem. Many methods have 
been suggested (Cattell, 1952) but 
none can be regarded as satisfactory. 
A reasonable solution for test con- 
struction purposes would be to use 
one of the simpler procedures (e.g., 
standard error of average r) and ap- 
ply it rather severely. Increased 
availability of automatic computing 
services may permit the use of maxi- 
mum likelihood methods of factoriz- 
ing which provide a test for rank. 

An index of unidimensionality ap- 
propriate to the method is the ratio 
of first factor variance to total bipolar 
factor variance after a complete 
centroid analysis with subjects who 
were not used for item selection. In 
most cases the ratio of first to second 
factor variance would seem to give a 
reasonably useful index. This index 
has no fixed maximum value and 
little is known about the extent to 
which it may be affected by errors of 
sampling or of measurement. 


DISCUSSION 


It seems clear that none of the 
methods examined can be regarded as 
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satisfying all three of the main criteria. 
Only factor analysis appears to offer 
a rational procedure for item selec- 
tion. The criteria and indices of uni- 
dimensionality are unsatisfactory for 
all methods. 

This review has considered each of 
the methods as if they were complete, 
self-consistent creations of a single 
writer. With the exception of the 
Guttman answer pattern method and 
the Loevinger method this is not so. 
The various “natural” criteria and 
indices suggested for each of the 
methods are not necessary conse- 
quences of the choice of item selection 
method. Combinations of different 
elements from different methods are 
possible and this circumstance justi- 
fies a modified optimism. Thus a 
modification of the coefficient of re- 
producibility which produced an ac- 
ceptable index of unidimensionality 
would not be cogent evidence for 
adopting an answer pattern method 
but would greatly improve all meth- 
ods. 

Greatest emphasis has been deliber- 
ately placed on item selection ra- 
tionale since this topic appears to 
have been relatively neglected in the 
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literature of the problem. Great 


advances appear unlikely unless the 
development of criteria and indices 
of unidimensionality is closely re- 
lated to item selection procedures. 


SUMMARY 


Five methods of constructing uni- 
dimensional tests (classical item anal- 
ysis, Loevinger’s procedure, the in- 
dependence criterion method, the 
answer pattern method, and factor 
analysis) have been considered with 
respect to their provision for: a ra- 
tional procedure for item selection, 
a criterion of unidimensionality, and 
an index of unidimensionality. 

It has been argued that only factor 
analysis provides a rational procedure 
for item selection. No method has a 
fully satisfactory criterion of unidi- 
mensionality. ‘The index of unidi- 
mensionality suggested for the factor 
analytic method is the ratio of first 
to second factor variance. This 
suffers from the absence of any knowl- 
edge of sampling fluctuations, but 
this weakness is shared by the only 
reasonable alternative, the coefficient 
of reproducibility. 
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For the present purposes we shall 
take our definition of Q methodology 
from Stephenson (1953), not be- 
cause he has been succinct, but be- 
cause he has been more comprehen- 
sive in his interest than other writers. 
Accordingly, Q as conceived by Cat- 
tell or Mowrer and related methodo- 
logical topics discussed by such 
writers as Cronbach are not included 
in the present review of recent 
studies. 

In his 1953 publication, The Study 
of Behavior, Stephenson informs us 
with a modesty which is charac- 
teristic for this book that “... the 


science of behavior can be immeas- 
urably improved by attending to a 


few principles upon which we have 
based the method now well known as 
‘Q-technique’”’ (p. 1). Time does not 
permit the long series of quotations 
which would be necessary in order to 
indicate fully Stephenson’s concept 
of Q methodology, but a few addi- 
tional quotations remind us that he 
was definite in his point of view. 

Our object has been to make it possible for 
studies to be undertaken on single cases (p. 2).! 
Briefly, a statement of the kind “All crows 
are black” is a general proposition. To say 
that “A crow is black”’ is clearly singular, but 
not testable. When, however, we can point to 
a particular crow X and assert that it is black, 
a singular testable proposition is at issue 
(p. 42). There never was a single matrix of 
scores to which both R and Q apply (p. 15). 





1 “By a ‘single case’ we mean, for the mo- 
_ ment, a single person under study or a single 
group of interacting persons. . . . what is in- 
volved is whether individual differences are 
postulated or whether singular propositions 
are being tested. The latter alone are our con- 
cern” (Stephenson, 1953, p. 2). 
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A defining summary of Stephen- 
son’s view of Q methodology could 
include at least six points: 

1. Q method appears to require 
ipsative variables, particularly Q 
sorts. 

2. Q method lends itself to corre- 
lations between people or between 
different conditions for the same per- 
son. 

3. Q method requires a concep- 
tually structured set of statements in 
order to interpret the correlations be- 
tween people—each set of statements 
comprising systematic combinations 
of different levels of the various hypo- 
thetical effects. 

4. Q method permits a study of a 
person by means of analysis of vari- 
ance of the statements, assuming that 
the sorted statements were initially 
structured as replications of the pos- 
sible combinations of a priori effects 
and levels of reaction. 

5. Q method favors a dependency 
type emphasis in factor analyses with 
rotations determined by the nature 
of the propositions concerning the 
variates. 

6. Q method leaves unanswered 
the question of the parent population 
from which the individual is drawn; 
the method examines singular prop- 
ositions on the assumption that some- 
where there are more people like the 
one under scrutiny. 

To date the most conspicuous use 
of Q methodology has been made by 
the so-called ‘‘self psychologists” who 
view discrepancies between one’s 
self-perception and the perception of 
an ideal self as an indication of 
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maladjustment. This interpretation 
of psychological maladjustment is 
consistent with Rogers’ belief in the 
self-actualizing function of the per- 
sonality and, as a consequence, finds 
direct application in his studies of the 
efficacy of psychotherapy (Rogers & 
Dymond, 1954). The work done by 
the Chicago group during the first 
half of the present decade used the 
now familiar device of correlating the 
individual’s Q sort describing his true 
self with his Q sort for his ideal self. 
Increases in these correlations dur- 
ing the course of therapy were taken 
as evidence of improvement. This 
was a notable application of Q meth- 
odology because in the opinion of 
many persons these studies com- 
prised the first acceptable indication 
that psychotherapy was efficacious 
in producing personality change. 
Beyond using Q methodology in es- 
tablishing this important landmark, 
the Chicago group was able to illu- 
minate some of the features of the 
psychotherapeutic process by factor 
analyzing the intercorrelations 
among various Q sorts for a given pa- 
tient. The case of Mrs. Oaks is illus- 
trative. During the course of therapy 
her concept of self became much more 
favorable, and there were some 
changes in her concept of an ideal 
self. Her therapeutic progress was 
summarized by a factor analysis of 
the intercorrelations among Q sorts 
made at various stages in the course 
of her therapy. 

There have been several evalua- 
tions of the validity of Q sorts as 
evidence of adjustment, particularly 
from the standpoint of their appro- 
priateness as criteria for therapeutic 
change. For example, Friedman 
(1955) supported the self-ideal con- 
sistency concept of good adjustment 
by a study which involved a com- 
parison between normals and neu- 
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rotics. The neurotic group was de- 
scribed as tending to regard their 
self-qualities as very much different 
from the way they would like them 
to be. Cartwright (1957) emphasized 
the consistency interpretation of 
good adjustment by showing that 
after psychotherapy subjects de- 
scribe themselves in relation to im- 
portant persons in their environment 
with as much consistency as controls. 
An increase in self-ideal congruence 
for a group of high school boys after 
counseling was reported by Caplan 
(1957), and Turner and Vanderlippe 
(1958) reported that college students 
with high § self-ideal congruence 
tended to have more extracurricular 
activities and to have higher scholas- 
tic averages than students with low 
self-ideal congruence. 

The apparent validity of self-ideal 
congruence was examined by Chase 
(1957), who compared adjusted and 
maladjusted hospital cases with re- 
spect to the various possible correla- 
tions involving Q sorts. Only those 
correlations containing the self-sort 
distinguished between the adjusted 
and the maladjusted group. 

The Q sort approach to adjustment 
is subjected to further scrutiny by 
Kogan, Quinn, Ax, and Ripley (1957). 
Using two comparable samples, one 
psychiatric patients and the other 
university students, sorts for four 
different conditions were obtained. 
The average sorts for the patient and 
the student groups were correlated 
for each of the four conditions. It 
was found that a great portion of the 
variance in these correlations could 
be accounted for in terms of either of 
two extraneous variables, the social 
desirability of the sorted statements 
or a sickness-health variable. Ed- 
wards (1955) had described the im- 
portance of social desirability in Q 
sorts as early as 1955. 
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There are reports which challenge 
the appropriateness of Q sorts as a 
direct evidence of the efficacy of 
psychotherapy. For example, Taylor 
(1955) undertook a study on the as- 
sumption that repeated introspec- 
tion would produce therapeutic type 
changes in self-concept. His subjects 
made repeated Q sorts. In conse- 
quence of this procedure, there was 
an increase in positive self-concepts, 
and the self-ideal correlation in- 
creased. From this one need not in- 
fer that self-concepts are unstable, 
however. Engel (1959) examined the 
self-concepts of a group of adolescents 
in 1954 and again in 1956, and re- 
ported that items relative to positive 
self-concepts had appreciable stabil- 
ity as indicated by a stability corre- 
lation of .69. 

Levy (1956) challenged the mean- 
ing of self-ideal discrepancies by 
comparing self-ideal__correlations 


based on the Butler and Haigh (1954) 


items with the correlation between 
sorts for an actual and an ideal home 
town. He found these two sets of 
actual-ideal correlations to be corre- 
lated with each other to the order of 
.70. Because of this, he suspects that 
the discrepancies perceived between 
actual and ideal states of affairs have 
implications for the individual’s ad- 
justment, regardless of the area in 
which the discrepancy is shown. 
Although @Q sorts are frequently 
employed in the published literature, 
the use is often relatively uncritical. 
In some instances it would appear 
that a normative procedure would 
have served the investigator’s pur- 
poses better than the ipsative Q 
sorts. It appears probable, however, 
that investigators have been en- 
couraged by the availability of Q sort 
procedures, and some of the resulting 
studies might not have been under- 
taken if only a normative type em- 
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phasis had been available. For ex- 
ample, in Stewart’s (1958) study of 
the relationship between manifest 
anxiety and mother-son identifica- 
tion, facets of mother-son identifica- 
tion are readily revealed and usefully 
quantified by correlations between 
various sorts provided by the mothers 
and their sons. Stewart found that 
the boys with the greatest manifest 
anxiety were those with the greatest 
discrepancy between their self-per- 
ceptions and their mother’s ideal for 
them. 

Correlating Q sorts was a con- 
venient device for Kalis and Bennett 
(1957), who wished to show that 
communication between the patient 
and members of his family was im- 
proved for those patients whose hos- 
pitalization had been successful. The 
importance of similarity of self-per- 
ceptions in interpersonal relation- 
ships is further illuminated by Cor- 
sini’s (1956) use of Q sorts in his 
study of happiness in marriage. 
These studies are reminiscent of a re- 
port by Revie (1956), who used Q 
sorts to describe both the teacher’s 
and the school psychologist’s concept 
of pupils. As a result of their inter- 
action, both the teacher’s and the 
psychologist’s concept of the pupil 
changed. 

Q sorts have been used in many 
different ways, particularly in the 
study of personality. Shontz (1956) 
used Q sorts in order to examine the 
concept of a healthy personality, 
while Reznikoff and Toomey (1958) 
worked out a system of weightings 
whereby observers’ Q sorts of patients 
may be scored to estimate the degree 
of emotional disturbance. Epstein 
and Smith (1956) used Q sorts as a 
sociometric device by having stu- 
dents Q sort their fellows with re- 
spect to the degree of hostility in 
their behavior. Fiske and Van Bus- 
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kirk (1959) used Q-sort procedures in 
order to examine the stability of sen- 
tence completion test interpretations, 
and Doleys and Kregarman (1959) 
report that self-ideal congruence does 
not measure frustration tolerance. 
Nahinsky (1958) used a self-ideal 
comparison to distinguish career 
from noncareer naval officers, and 
Whiting (1959) had nurses, patients, 
and physicians sort statements con- 
cerning the importance of various 
aspects of the nurse’s work. This ap- 
pears to be the kind of a study where 
rating scales, inventories, or check 
lists could not have served the in- 
vestigator’s purposes as well as the 
ipsative sort. 

The unique value of Q sorts has 
not been made sufficiently explicit to 
permit an investigator to know the 
kinds of situations which call for 


ipsative procedures and the kinds of 
situations in which his purposes will 
be better served by normative proce- 


dures. There are numerous studies in 
the literature which employ Q sorts 
without indicating why this particu- 
lar method was chosen. Sometimes 
it appears that Q sorts are used be- 
cause no reliable normative instru- 
ment is available to distinguish be- 
tween persons along a relevant con- 
tinuum. The question of the reliabil- 
ity or the validity of the Q sort is 
rarely raised, and if practice alone 
were considered, one could infer that 
reliable and valid ipsative distinc- 
tions based on a Q-sort procedure are 
much easier to establish than reliable 
and valid normative procedures. 
Even if this were true, and your re- 
viewer has not seen material which 
would lead to this conclusion, one 
could still question the use of an 
ipsative procedure showing intra- 
individual differences when a norma- 
tive procedure dealing with inter- 
individual differences appears to be 
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indicated by the general require- 
ments of the investigation. For ex- 
ample, Morsh (1955) described the 
use of a Q-sort procedure in securing 
the classes’ evaluation of the teachers. 
Why an ipsative type evaluation of 
teachers is preferable to a normative- 
type procedure is not indicated in his 
report. 

In a study of the relationship be- 
tween some personality variables and 
speed in decision making, Block and 
Peterson (1955) used the staff’s Q 
sort of the subject as a measure of 
personality. Although the results of 
this investigation are interesting and 
worthwhile, it would appear that the 
emphasis is one of individual dif- 
ferences and that a normative meas- 
ure of personality would have been 
the logical choice. Both Cattell 
(1944) and Guilford (1954) have 
warned that ipsative measures should 
not be used in attempts to study in- 
dividual differences. The amount of 
error in such a maneuver need not be 
invariably great, however. For ex- 
ample, Block (1957) matched items 
from a Q sort with items that were 
used in a normative-type rating 
and found that in one sample the 
correlations between various items 
ranged from .63 to .88, while similar 
correlations for another sample 
ranged from .31 to .74. Apparently, 
the error involved in using ipsative 
item scores in a normative manner 
may vary greatly from item to item 
and from sample to sample. 

In addition to their applications in 
various studies of personality, Q 
sorts are also applied in the study of 
psychopathology. For example, 
Rogers (1958) found that the self- 
ideal congruence for paranoid schizo- 
phrenics was greater than for nor- 
mals. His approach is noteworthy 
because of its novelty. Instead of 
having his subjects sort cards, he 
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asked them to manipulate a red 
square over a blue square, with the 
red square representing the self, the 
blue square representing the ideal, 
and the overlap representing the de- 
gree of congruence. Although the 
spatial interpretation that the sub- 
ject gives his judgment is absolute 
and could lend itself to normative 
treatment, the sense of these manip- 
ulations is clearly ipsative. This 
study, which was published in 1958, 
showed a high degree of self-ideal 
congruence for paranoids and can be 
compared with Friedman’s 1955 
study which included a sample of 
paranoid schizophrenics. Friedman 
found that only 3 of his 16 paranoids 
showed a low self-ideal correlation. 

Other schizophrenics are much 
more distinctive with respect to their 
behavior in the Q-sort situation. For 
example, Helfand (1956) asked vari- 
ous subjects, including  schizo- 
phrenics, to simulate the Q sort of a 


former patient whose autobiography 


they read. He then computed the 
correlations between the sorts pro- 
vided by his various subjects and the 
sort provided by the former patient. 
He found that the schizophrenics’ 
simulated sorts had the lowest cor- 
relations of all. He ascribes this to a 
limitation in role-taking ability. A 
recent paper by Fagan and Guthrie 
(1959) tells us more about the 
schizophrenics. Their subjects were 
asked to describe themselves in one 
sort and to describe an average per- 
son in another. The subjects were in- 
tercorrelated for the two different 
sorts, and the two sets of intercorre- 
lations were factor analyzed. The 
authors concluded that _ schizo- 
phrenics, like many other patients, 
view themselves differently from the 
way they view other persons. 

The mothers of schizophrenic pa- 
tients have also been studied by Q 


J. R. WITTENBORN 


methodologists. Shepherd and Gu- 
thrie (1959) had the mothers of 20 
male schizophrenics sort 100 items 
concerning children and family life. 
These sorts made it possible to cor- 
relate each mother with every other 
one. The correlations were factor 
analyzed and the resulting factors— 
identified as Detached Authoritar- 
ianism, Inadequacy and Inconsist- 
ency, Pervasive Control, Sophis- 
ticated Denial of Inadequate Mother- 
ing, and Annoyance and Rejection— 
broaden our view of the various 
qualities or dimensions of schizo- 
phrenic mothering. One immedi- 
ately becomes interested in the man- 
ner in which one might generalize 
from these mothers of schizophrenics 
to other mothers and thereby gauge 
how broadly applicable one might 
find such dimensions of schizophenic 
mothering. Unfortunately, this is 
one of the ways in which Q method- 
ology is weakest. We do not know 
what population the individual or 
individuals under scrutiny represent. 
Stephenson (1953) seems to feel that 
this really does not matter as long as 
he can assume that there are other 
similar individuals somewhere. He 
calls ducking this practical issue test- 
ing a “singular proposition.” 

There are many published studies 
which involve factor analyzing inter- 
correlations between persons. Ac- 
cording to Stephenson’s criteria, how- 
ever, only a few of these would qual- 
ify as an application of Q methodol- 
ogy. Since Stephenson states that 
there is no matrix of correlations 
which can be studied by both R and 
Q methods, one is inclined to con- 
clude that one can appropriately in- 
tercorrelate persons for Q purposes 
only when the similarity of the per- 
sons is expressed by a correlation 
based on ipsative scales, i.e., scales 
on which people have distinguished 
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between items and not necessarily 
scales which distinguish between 
people on any normative basis of in- 
dividual differences. Thus, your re- 
viewer's factor analyses of various 
diagnostic groups, although de- 
signed to show that different varie- 
ties of patients may have the same 
diagnosis, should not be considered 
as application of Q methodology be- 
cause the correlations between per- 
sons were based on standard rating 
scales designed to show individual 
differences. There are many such 
obverse factor analyses, and although 
they are commonly called Q studies, 
they do not meet Stephenson’s cri- 
teria. The Bendig and Hamlin 
(1955) investigation of Rorschach 


scoring categories is another study 
of this type. 

Perhaps the most valuable ap- 
plications of factor analysis in Q 
methodology may come from studies 
of therapeutic phenomena. The pos- 


sibilities of such an approach were 
anticipated as early as 1951, when 
Fiedler published a factor analytic 
study of differences between thera- 
pists from different schools and with 
different levels of training. Despite 
the potential of such studies for help- 
ing to place psychotherapy on a ra- 
tional, empirically verifiable basis, 
only a few students of psychotherapy 
appear to be ready to study thera- 
peutic phenomena with the sys- 
tematic planfulness which Q method- 
ology could facilitate. In one such 
study, Nunnally:(1955b) had a thera- 
pist describe a patient by means of 
Q sorts on eight successive occasions. 
The factor analysis of these inter- 
correlations yielded two factors—one 
concerning relationships with the 
therapist and the other relating to 
intrapersonal confidence. ; 
The Peterson, Snyder, Guthrie, 
and Ray (1958) investigation of 
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therapeutic biases provides a promis- 
ing exploration. They approached 
their study in a sound manner by 
thoughtfully structuring the sample 
of statements which comprised their 
Q sorts, systematically using varia- 
tions of such hypothetical dimensions 
as direction of gain, attitudes, mode 
of change, and area of conflict. The 
sample of therapists who were inter- 
correlated was drawn from graduates 
of their own program so that one is 
not left up in the air with respect to 
the population of persons to whom 
the results may be generalized. As 
is usual in such studies, the factors 
were interpreted on the basis of the 
items which received a characteristic 
sort by persons who had high load- 
ings on the factor. The practice of 
interpreting persons in terms of item 
smacks of R methodology and re- 
minds us that people are usually 
more distinguished by their behavior 
than behavior is distinguished by the 
people who perform it. 

Thrush published an interesting 
study in 1957. Using a sample of 60 
statements descriptive of problems 
encountered by a counseling agency, 
the staff made sorts of the level and 
kind of service each problem would 
require. These sorts were made in 
1952 and again in 1956. On the basis 
of these sorts, the members of the 
staff were intercorrelated for each of 
the years and the two sets of inter- 
correlations were factor analyzed 
separately. A comparison of the re- 
sults indicated that the emphasis in 
the agency had shifted from voca- 
tional counseling to personal adjust- 
ment counseling. Although studies 
of this kind are illuminating, they re- 
mind us that we have no rigorous 
basis for comparing the results of 
factor analyses to test an exact 
statistical hypothesis. The question 
of how one should generalize from a 
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Q-type study is usually disregarded. 
Conger, Sawrey, and Krause (1956) 
point to an aspect of this problem in 
their study of Beck’s “The Six 
Schizophrenias”’ (1954). 

In commenting upon factor anal- 
ysis in Q methodology, one should re- 
member that Stephenson indicated 
that the correlations should be in 
part expressive of the effect of dif- 
ferent kinds of operations. He in- 
tended that the intercorrelated vari- 
ates should, in some manner or an- 
other, be regarded as dependent 
variables in an experimental sense 
and. not merely descriptive dimen- 
tions of a static situation. From the 
standpoint of this emphasis, the 
Sweetland and Frank (1955) study 
of ideal psychological adjustment is 
not a good example of Q methodology 
because its purpose appears to be to 
describe kinds of psychological ad- 
justment rather than to reveal the 
effects of certain operations, i.e., it is 
not a dependency-type analysis. This 
descriptive use of the Q-type factor 
analysis is not unique to Sweetland 
and Frank, however; other examples 
would include Broen’s (1957) factor 
analytic study of religious attitudes. 

Many of the samples of statements 
which have been sorted in Q meth- 
odology appear to have been some- 
what informally assembled, and as a 
consequence, the analyses performed 
on the sorts provided by various per- 
sons or by the same person under 
various instructions have an uncer- 
tain meaning. We do not know from 
what parent population of behavior 
they might conceivably be drawn or 
from what specific theory they could 
have been generated. It is probably 
for this reason that we find relatively 
few studies where the Q sort arrays 
for an individual or a group of in- 
dividuals are submitted to an analysis 
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of variance. This is unfortunate be- 
cause the difference or similarity be- 
tween the Q sorts of two individuals 
or the Q sorts of an individual under 
two or more conditions must be ex- 
plained in terms of the sorted items 
which comprise the Q arrays. If the 
items had been included in the sample 
as a priori representatives of theo- 
retically relevant classes of behavior, 
then the order given to the items 
could in the case of any given Q sort 
be entered into an analysis of vari- 
ance. In this way the relative status 
which the sorter assigned to various 
a priori classes of items could be re- 
vealed. In many studies, however, 
a defensible a priori classification of 
behavior with respect to kinds and 
levels is not possible because the area 
of inquiry is not well known, no sys- 
tematic theory can be confidently ap- 
plied, and in a sense the investigation 
is exploratory. If, in the study of 
such an area of behavior, Q method- 
ology were indicated, it would seem 
desirable first to intercorrelate and 
factor analyze the items in the R 
tradition. Then a sample of state- 
ments for Q sorts could be arranged 
so that the various factors could be 
represented in a balanced design. 
From such structured samples of 
statements, the Q methodology could 
be applied in the recommended man- 
ner by first factor analyzing the 
variates (e.g., people) and then ex- 
plaining the Q factors in terms of an 
analysis of variance of the sorts pro- 
vided by the variates. The reviewer 
saw no studies where the domain of 
behavior was first explored by an 
R-type analysis as a basis for build- 
ing a structured sample of state- 
ments for the Q sorts. Where anal- 
ysis of variance had been applied to 
Q sort arrays, the investigator had 
carefully structured his sample on an 
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a priori basis. Such studies are few 
and tend to be found in the recent 
literature. 

One of the earlier studies involving 
an analysis of variance was provided 
in 1956 by Kerlinger who constructed 
a set of Q statements which represent 
two kinds of educational attitudes 
interpreted at four different levels 
each. The levels for each class were 
then systematically replicated with 
10 statements each, so that there 
were 80 statements in all. 

In 1958, Rawn published a study 
of transference and resistance in 
psychotherapy. The statements to 
be sorted conformed with the re- 
quirements of a balanced block de- 
sign involving two levels of resistance 
and three classes of transference. 
These categories of class and level 
could be combined to form six kinds 
of statements. Each of these types of 
statements was interpreted in 15 dif- 
ferent ways to form the replications, 
and accordingly the structured sam- 
ple comprised 90 statements in all. 
These statements were sorted by dif- 
ferent raters and for different sessions 
of recorded psychotherapy. Because 
of the way the sample was structured 
the investigator could perform an 
analysis of variance for the various 
sorts as well as factor analyze the in- 
tercorrelations among the sorts. His 
purposes required the analysis of 
variance only, however. 

Perhaps some of the most sub- 
stantial values to accrue from the 
point of view known as Q method- 
ology may lie in the fact that more of 
us have become increasingly thought- 
ful about many matters which we 
had formerly disregarded or post- 
poned. Possibly one of these neg- 
lected matters is the hiatus between 
the clinicians who continue to be 
interested in intra-individual differ- 


139 


ences and the psychometricians who, 
acting from the standpoint of in- 
terindividual concepts of reliability, 
have dismissed intra-individual dif- 
ferences as trivial or of no possible 
consequence. 

There is a general tendency for in- 
vestigators to compute correlation 
coefficients without giving much 
thought to the meaning or the de- 
terminers of the relationship. @Q 
methodology is leading us to think 
more realistically about features 
which contribute to the degree of 
correlation between either subjects or 
items. If, for example, the sample of 
items is not homogeneous, it would 
seem possible for several pairs of 
persons to be equally correlated with 
each other but for the various pairs 
to have their respective correlations 
because of different items of behavior. 
As a consequence, none of the items 
may characterize all of the intercor- 
related persons. Presumably a sim- 
ilar kind of situation could exist if 
items were intercorrelated for a 
group of persons representing sub- 
samples of different populations. In 
such a case the correlations found 
between any two items might vary 
considerably if they were separately 
calculated for the various subsamples 
instead of being calculated for the 
heterogeneous group. Obviously, the 
investigator is on shaky ground when 
he assumes that a correlation based 
on one sample is descriptive of some 
other sample which is comprised in 
some different manner. The com- 
position of the sample with respect to 
persons can obviously influence the 
correlation between items, or the com- 
position of a sample with respect to 
items could influence the correlation 
between persons. 

Some aspects of this problem of 
subject homogeneity were discussed 
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by Block in 1955, and in the same 
year Nunnally (1955a) described an 
hypothetical matrix where sample 
heterogeneity with respect to persons 
resulted in very low correlations be- 
tween variables while the obverse 
type correlations between the in- 
dividuals were very high. Nunnally 
implied that ipsative scores are par- 
ticularly valuable in yielding Q-type 
correlations which could reveal trends 
not apparent from R-type analyses. 
The way in which this matter de- 
pends upon the homogeneity of sam- 
ples and the way in which it may be 
related to type of scale were not 
made explicit, however. 

The growing interest in Q proce- 
dure has generated several method- 
ological studies. Cohen (1957) has 
prepared a monograph which permits 
the investigator to read correlation 
coefficients between Q sorts, and 
Creaser (1955) has recommended a 
way for determining the amount an 
item shouid be weighted with respect 
to a given factor. 

Goodling and Guthrie (1956) point 
out that the sample of items for Q 
sort should be selected in such a way 
as to provide maximum intersubject 
variability and minimum intrasub- 
ject variability. The question of in- 
trasubject variability is one aspect of 
the reliability question, and this has 
been attacked directly by some in- 
vestigators. For example, Hilden 
(1958) describes a sampling experi- 
ment where he begins with a universe 
of 1,575 statements from which he 
has randomly drawn 20 samples of 50 
statements each. He had four grad- 
uate students provide self-ideal sorts 
for each of the 20 random sets and 
for the total population as well. The 
various scores, e.g., self-ideal, from 
any one set were correlated with 
each other, and the respective corre- 
lations were determined for the popu- 
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lation. When the correlations for the 
random sets were compared with the 
correlations for the parent popula- 
tion, no reliable differences were 
found. From this one might infer 
that when using items such as these, 
a sample of 50 statements may be 
sufficient for Q-sort purposes. 

There appears to be a general tend- 
ency among investigators to require 
their subjects to distribute their Q 
sorts in a quasi-normal fashion. This 
is in spite of the fact that Stephenson 
had recommended a flattened, bell- 
shaped distribution and that subse- 
quent investigators had questioned 
the desirability of quasi-normal dis- 
tributions. Jones (1956), for example, 
had noted that the free sorts of vari- 
ous groups differed appreciably from 
each other and that no group se- 
lected a _ bell-shaped distribution. 
Livson and Nichols (1956) had ex- 
amined this problem from the stand- 
point of the number of discrimina- 
tions that various shaped distribu- 
tions involve, and noted that the 
more discriminations required, the 
greater the test-retest reliability of 
the sort. On the basis of this finding, 
these authors recommend that the Q- 
sort distribution should be rectangu- 
lar. The issue of forced vs. unforced 
sorts has been discussed in numerous 
contexts, and no finalagreementseems 
to have been reached. For example, 
Jones points out that there is no one 
preferred distribution, and Block 
(1956) believes, on the basis of his 
comparisons, that the forced sort 
method is equal or superior to free 
sorts. 

Whether Q methodology will, as 
Stephenson proposed, create a psy- 
chology of the individual remains to 
be seen. From the standpoint of 
psychometry with its emphasis on in- 
dividual differences or from the 
standpoint of psychoanalysis with its 
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avoidance of formal instrumentation, 
Q methodology and the devices it in- 
cludes do not provide an orthodox 
approach to the study of the in- 
dividual. Certainly those particular 


psychologists who profess to be in- 
terested primarily in the individual 
have not rushed to apply this method 
to material which is still handled on 
an anecdotal or case history basis. 
Nevertheless, Q method’s primary 
contributions to psychology appear 
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to be in the study of psychotherapy 
and the related study of persons with 
personality disorders, and there are 
indications that this methodological 
emphasis can contribute to a broad 
study of personality and numerous 
related social problems. The growing 
acceptance of this methodological 
emphasis again reminds us that 
psychologists require flexible meth- 
ods for their researches and will not 
wait for any orthodoxy. 
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While it is customary to concep- 
tualize psychotherapy as a learning 
process, few therapists accept the full 
implications of this position. Indeed, 
this is best illustrated by the writings 
of the learning theorists themselves. 
Most of our current methods of 
psychotherapy represent an accumu- 
lation of more or less uncontrolled 
clinical experiences and, in many in- 
stances, those who have written 
about psychotherapy in terms of 
learning theory have merely substi- 
tuted a new language; the practice re- 
mains essentially unchanged({ Dollard, 
Auld, & White, 1954; Dollard & 


Miller, 1950; Shoben, 1949). 
If one seriously subscribes to the 
view that psychotherapy is a learn- 


ing process, the methods of treatment 
should be derived from our knowl- 
edge of learning and motivation. 
Such an orientation is likely to yield 
new techniques of treatment which, 
in many respects, may differ mark- 
edly from the procedures currently 
in use. 

Psychotherapy rests on a very 
simple but fundamental assumption, 
i.e., human behavior is modifiable 
through psychological procedures. 
When skeptics raise the question, 
“Does psychotherapy work?”’ they 
may be responding in part to the 
mysticism that has come to surround 
the term. Perhaps the more mean- 
ingful question, and one which avoids 
the surplus meanings associated with 
the term “‘psychotherapy,”’ is as fol- 
lows: Can human behavior be modi- 
fied through psychological means and 
if so, what are the learning mecha- 
nisms that mediate behavior change? 


In the sections that follow, some of 
these learning mechanisms will be 
discussed, and studies in which sys- 
tematic attempts have been made to 
apply these principles of learning to 
the area of psychotherapy will be re- 
viewed. Since learning theory itself 
is still somewhat incomplete, the list 
of psychological processes by which 
changes in behavior can occur should 
not be regarded as exhaustive, nor 


COUNTERCONDITIONING 


Of the various treatment methods 
derived from learning theory, those 
based on the principle of counter- 
conditioning have been elaborated in 
greatest detail. Wolpe (1954, 1958, 
1959) gives a thorough account of this 
method, and additional examples of 
cases treated in this manner are pro- 
vided by Jones (1956), Lazarus and 
Rachman (1957), Meyer (1957), and 
Rachman (1959). Briefly, the prin- 
ciple involved is as follows: if strong 
responses which are incompatible 
with anxiety reactions can be made 
to occur in the presence of anxiety 
evoking cues, the incompatible re- 
sponses will become attached to these 
cues and thereby weaken or eliminate 
the anxiety responses. 

The first systematic psychothera- 
peutic application of this method 
was reported by Jones (1924b) in the 
treatment of Peter, a boy who showed 
severe phobic reactions to animals, 
fur objects, cotton, hair, and me- 
chanical toys. Counterconditioning 
was achieved by feeding the child in 
the presence of initially small but 
gradually increasing anxiety-arousing 
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stimuli. A rabbit in a cage was placed 
in the room at some distance so as 
not to disturb the boy’s eating. Each 
day the rabbit was brought nearer 
‘to the table and eventually removed 
from the cage. During the final stage 
of treatment, the rabbit was placed 
on the feeding table and even in 
Peter’s lap. Tests of generalization 
revealed that the fear responses had 
been effectively eliminated, not only 
toward the rabbit, but toward the 
previously feared furry objects as 
well. 

In this connection, it would be in- 
teresting to speculate on the diag- 
nosis and treatment Peter would have 
received had he been seen by Melanie 
Klein (1949) rather than by Mary 
Cover Jones! 

It is interesting to note that while 
both Shoben (1949) and Wolpe 
(1958) propose a therapy based on 
the principle of counterconditioning, 
their treatment methods are radically 
different. According to Shoben, the 
patient discusses and thinks about 
stimulus situations that are anxiety 
provoking in the context of an inter- 
personal situation which simultane- 
ously elicits positive affective re- 
sponses from the patient. The thera- 
peutic process consists in connecting 
the anxiety provoking stimuli, which 
are symbolically reproduced, with 
the comfort reaction made to the 
therapeutic relationship. 

Shoben’s paper represents primar- 
ily a counterconditioning interpreta- 
tion of the behavior changes brought 
about through conventional forms of 
psychotherapy since, apart from high- 
lighting the role of positive emotional 
reactions in the treatment process, 
no new techniques deliberately de- 
signed to facilitate relearning through 
counterconditioning are proposed. 

This is not the case with Wolpe, 
who has made a radical departure 
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from tradition. In his treatment, 
which he calls reciprocal inhibition, 
Wolpe makes systematic use of three 
types of responses which are antag- 
onistic to, and therefore inhibitory of, 
anxiety. These are: assertive or ap- 
proach responses, sexual responses, 
and relaxation responses. 

On the basis of historical informa- 
tion, interview data, and psycho- 
logical test responses, the therapist 
constructs an anxiety hierarchy, a 
ranked list of stimuli to which the 
patient reacts with anxiety. In the 
case of desensitization based on re- 
laxation, the patient is hypnotized 
and given relaxation suggestions. He 
is then asked to imagine a scene 
representing the weakest item on the 
anxiety hierarchy and, if the relaxa- 
tion is unimpaired, this is followed 
by having the patient imagine the 
next item on the list, and so on. 
Thus, the anxiety cues are gradually 
increased from session to session until 
the last phobic stimulus can be pre- 
sented without impairing the re- 
laxed state. Through this procedure, 
relaxation responses eventually come 
to be attached to the anxiety evoking 
stimuli. 

Wolpe reports remarkable thera- 
peutic success with a wide range of 
neurotic reactions treated on this 
counterconditioning principle. He 
also contends that the favorable out- 
comes achieved by the more conven- 
tional psychotherapeutic methods 
may result from the reciprocal in- 
hibition of anxiety by strong positive 
responses evoked in the patient-ther- 
apist relationship. 

Although the counterconditioning 
method has been employed most ex- 
tensively in eliminating anxiety- 
motivated avoidance reactions and 
inhibitions, it has been used with 
some success in reducing maladaptive 
approach responses as well. In the 
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latter case, the goal object is re- 
peatedly associated with some form 
of aversive stimulus. 

Raymond (1956), for example, 
used nausea as the aversion experi- 
ence in the treatment of a patient 
who presented a fetish for handbags 
and perambulators which brought 
him into frequent contact with the 
law in that he repeatedly smeared 
mucus on ladies’ handbags and de- 
stroyed perambulators by running 
into them with his motorcycle. 
Though the patient had undergone 
psychoanalytic treatment, and was 
fully aware of the origin and the 
sexual significance of his behavior, 
nevertheless, the fetish persisted. 

The treatment consisted of show- 
ing the patient a collection of hand- 
bags, perambulators, and colored il- 
lustrations just before the onset of 
nausea produced by injections of 


apomorphine. The conditioning was 
repeated every 2 hours day and night 
for 1 week plus additional sessions 8 


days and 6 months later. 

Raymond reports that, not only 
was the fetish successfully eliminated, 
but also the patient showed a vast 
improvement in his social (and legal) 
relationships, was promoted to a 
more responsible position in his work, 
and no longer required the fetish fan- 
tasies to enable him to have sexual 
intercourse. 

Nauseant drugs, especially eme- 
tine, have also been utilized as the 
unconditioned stimulus in the aver- 
sion treatment of alcoholism (Thir- 
mann, 1949; Thompson & Bielinski, 
1953; Voegtlen, 1940; Wallace, 1949). 
Usually 8 to 10 treatments in which 
the sight, smell, and taste of alcohol 
is associated with the onset of nausea 
is sufficient to produce abstinence. Of 
1,000 or more cases on whom ade- 
quate follow-up data are reported, 
approximately 60% of the patients 
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have been totally abstinent follow- 
ing the treatment. Voegtlen (1940) 
suggests that a few preventive treat- 
ments given at an interval of about 6 
months may further improve the re- 
sults yielded by this method. 

Despite these encouraging findings, 
most psychotherapists are unlikely 
to be impressed since, in their opinion, 
the underlying causes for the alco- 
holism have in no way been modified 
by the conditioning procedure and, 
if anything, the mere removal of the 
alcoholism would tend to produce 
symptom substitution or other ad- 
verse effects. A full discussion of this 
issue will be presented later. In this 
particular context, however, several 
aspects of the Thompson and Bielin- 
ski (1953) data are worth noting. 
Among the alcoholic patients whom 
they treated, six “suffered from men- 
tal disorders not due to alcohol or 
associated deficiency states.”” It was 
planned, by the authors, to follow up 
the aversion treatment with psycho- 
therapy for the underlying psychosis. 
This, however, proved unnecessary 
since all but one of the patients, a 
case of chronic mental deterioration, 
showed marked improvement and 
were in a state of remission. 

Max (1935) employed a strong 
electric shock as the aversive stimulus 
in treating a patient who tended to 
display homosexual behavior follow- 
ing exposure to a fetishistic stimulus. 
Both the fetish and the homosexual 
behavior were removed through a 
series of avoidance conditioning ses- 
sions in which the patient was ad- 
ministered shock in the presence of 
the fetishistic object. 

Wolpe (1958) has also reported 
favorable results with a similar pro- 
cedure in the treatment of obsessions. 

A further variation of the counter- 
conditioning procedure has been de- 
veloped by Mowrer and Mowrer 
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(1938) for use with enuretic patients. 
The device consists of a wired bed 
pad which sets off a loud buzzer and 
awakens the child as soon as mictu- 
rition begins. Bladder tension thus 
becomes a cue for waking up which, 
in turn, is followed by sphincter con- 
traction. Once bladder pressure be- 
comes a stimulus for the more remote 
sphincter control response, the child 
is able to remain dry for relatively 
long periods of time without waken- 
ing. 

Mowrer and Mowrer (1938) report 
complete success with 30 children 
treated by this method; similarly, 
Davidson and Douglass (1950) 
achieved highly successful results 
with 20 chronic enuretic children (15 
cured, 5 markedly improved); of 5 
cases treated by Morgan and Witmer 
(1939), 4 of the children not only 
gained full sphincter control, but also 
made a significant improvement in 
their social behavior. The one child 
with whom the conditioning approach 
had failed was later found to have 
bladder difficulties which required 
medical attention. 

Some additional evidence for the 
efficacy of this method is provided 
by Martin and Kubly (1955) who ob- 
tained follow-up information from 
118 of 220 parents who had treated 
their children at home with this type 
of conditioning apparatus. In 74% 
of the cases, according to the parents’ 
replies, the treatment was successful. 


EXTINCTION 


“When a learned response is re- 


peated without reinforcement the 
strength of the tendency to perform 
that response undergoes a progressive 
decrease’”’ (Dollard & Miller, 1950). 
Extinction involves the development 
of inhibitory potential which is com- 
posed of two components. The evo- 
cation of any reaction generates reac- 
tive inhibition (I,) which presumably 
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dissipates with time. When reactive 
inhibition (fatigue, etc.) reaches a 
high point, the cessation of activity 
alleviates this negative motivational 
state and any stimuli associated with 
the cessation of the response become 
conditioned inhibitors (,I,). 

One factor that has been shown to 
influence the rate of extinction of 
maladaptive and anxiety-motivated 
behavior is the interval between ex- 
tinction trials. In general, there 
tends to be little diminution in the 
strength of fear-motivated behavior 
when extinction trials are widely 
distributed, whereas under massed 
trials, reactive inhibition builds up 
rapidly and consequently extinction 
is accelerated (Calvin, Clifford, Clif- 
ford, Bolden, & Harvey, 1956; Ed- 
monson & Amsel, 1954). 

An illustration of the application of 
this principle is provided by Yates 
(1958) in the treatment of tics. Yates 
demonstrated, in line with the find- 
ings from laboratory studies of ex- 
tinction under massed and distrib- 
uted practice, that massed sessions in 
which the patient performed tics 
voluntarily followed by prolonged 
rest to allow for the dissipation of re- 
active inhibition was the most effec- 
tive procedure for extinguishing the 
tics. 

It should be noted that the extinc- 
tion procedure employed by Yates is 
very similar to Dunlap’s method of 
negative practice, in which the sub- 
ject reproduces the negative behav- 
iors voluntarily without reinforce- 
ment (Dunlap, 1932; Lehner, 1954). 
This method has been applied most 
frequently, with varying degrees of 
success, to the treatment of speech 
disorders (Fishman, 1937; Meissner, 
1946; Rutherford, 1940; Sheehan, 
1951; Sheehan & Voas, 1957). If the 
effectiveness of this psychothera- 
peutic technique is due primarily to 
extinction, as suggested by Yates’ 
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study, the usual practice of terminat- 
ing a treatment session before the 
subject becomes fatigued (Lehner, 
1954), would have the effect of reduc- 
ing the rate of extinction, and may in 
part account for the divergent results 
yielded by this method. 

Additional examples of the thera- 
peutic application of extinction pro- 
cedures are provided by Jones (1955), 
and most recently by C. D. Williams 
(1959). 

Most of the conventional forms of 
psychotherapy rely heavily on ex- 
tinction effects although the therapist 
may not label these as such. For 
example, many therapists consider 
permissiveness to be a necessary con- 
dition of therapeutic change (Alex- 
ander, 1956; Dollard & Miller, 1950; 
Rogers, 1951). It is expected that 


when a patient expresses thoughts or 
feelings that provoke anxiety or guilt 
and the therapist does not disap- 
prove, criticize, or withdraw interest, 
the fear or guilt will be gradually 


weakened or extinguished. The ex- 
tinction effects are believed to gen- 
eralize to thoughts concerning related 
topics that were originally inhibited, 
and to verbal and physical forms of 
behavior as well (Dollard & Miller, 
1950). 

Some evidence for the relationship 
between permissiveness and the ex- 
tinction of anxiety is provided in two 
studies recently reported by Dittes 
(1957a, 1957b). In one study (1957b) 
involving an analysis of patient- 
therapist interaction sequences, Dit- 
tes found that permissive responses 
on the part of the therapist were fol- 
lowed by a corresponding decrease in 
the patient’s anxiety (as measured by 
the GSR) and the occurrence of 
avoidance behaviors. A sequential 
analysis of the therapeutic sessions 
(Dittes, 1957a), revealed that, at the 
onset of treatment, sex expressions 
were accompanied by strong anxiety 
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reactions; under the cumulative ef- 
fects of permissiveness, the anxiety 
gradually extinguished. 

In contrast to counterconditioning, 
extinction is likely to be a less effec- 
tive and a more time consuming 
method for eliminating maladaptive 
behavior (Jones, 1924a; Dollard & 
Miller, 1950); in the case of conven- 
tional interview therapy, the rela- 
tively long intervals between inter- 
view sessions, and the ritualistic 
adherence to the 50-minute hour may 
further reduce the occurrence of ex- 
tinction effects. 


DISCRIMINATION LEARNING 


Human functioning would be ex- 
tremely difficult and inefficient if a 
person had to learn appropriate be- 
havior for every specific situation he 
encountered. Fortunately, patterns 
of behavior learned in one situation 
will transfer or generalize to other 
similar situations. On the other hand, 
if a person overgeneralizes from one 
situation to another, or if the gen- 
eralization is based on superficial or 
irrelevant cues, behavior becomes 
inappropriate and maladaptive. 

In most theories of psychotherapy, 
therefore, discrimination learning, 
believed to be accomplished through 
the gaining of awareness or insight, 
receives emphasis (Dollard & Miller, 
1950; Fenichel, 1941; Rogers, 1951; 
Sullivan, 1953). It is generally as- 
sumed that if a patient is aware of 
the cues producing his behavior, of 
the responses he is making, and of the 
reasons that he responds the way he 
does, his behavior will become more 
susceptible to verbally-mediated con- 
trol. Voluntarily guided, discrimina- 
tive behavior will replace the auto- 
matic, overgeneralized reactions. 

While this view is widely accepted, 
as evidenced in the almost exclusive 
reliance on interview procedures and 
on interpretative or labeling tech- 
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niques, a few therapists (Alexander & 
French, 1946) have questioned the 
importance attached to awareness in 
producing modifications in behavior. 
Whereas most psychoanalysts (Fen- 
ichel, 1941), as well as therapists 
representing other points of view 
(Fromm-Reichmann, 1950; Sullivan, 
1953) consider insight a precondition 
of behavior change, Alexander and 
French consider insight or awareness 
a result of change rather than its 
cause. That is, as the patient’s anxie- 
ties are gradually reduced through 
the permissive tonditions of treat- 
ment, formerly inhibited thoughts 
are gradually restored to awareness. 

Evidence obtained through con- 
trolled laboratory studies concerning 
the value of awareness in increasing 
the precision of discrimination has so 
far been largely negative or at least 
equivocal (Adams, 1957; Erikson, 
1958; Razran, 1949). A study by 
Lacy and Smith (1954), in which they 
found aware subjects generalized 
anxiety reactions less extensively 
than did subjects who were unaware 
of the conditioned stimulus provides 
evidence that awareness may aid dis- 
crimination. However, other aspects 
of their findings (e.g., the magnitude 
of the anxiety reactions to the gen- 
eralization stimuli were greater than 
they were to the conditioned stimulus 
itself) indicate the need for replica- 
tion. 

If future research continues to 
demonstrate that awareness exerts 
little influence on the acquisition, 
generalization, and modification of 
behavior, such negative results would 
cast serious doubt on the value of 
currently popular psychotherapeutic 
procedures whose primary aim is the 
development of insight. 


METHODS OF REWARD 


Most theories of psychotherapy 
are based on the assumption that the 
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patient has a repertoire of previously 
learned positive habits available to 
him, but that these adaptive patterns 
are inhibited or blocked by compet- 
ing responses motivated by anxiety 
or guilt. The goal of therapy, then, is 
to reduce the severity of the internal 
inhibitory controls, thus allowing the 
healthy patterns of behavior to 
emerge. Hence, the role of the thera- 
pist is to create permissive conditions 
under which the patient’s “normal 
growth potentialities’ are set freé 
(Rogers, 1951). The fact that most of 
our theories of personality and thera- 
peutic procedures have been devel- 
oped primarily through work with 
oversocialized, neurotic patients may 
account in part for the prevalence of 
this view. 

There is a large class of disorders 
(the undersocialized, antisocial per- 
sonalities whose behavior reflects a 
failure of the socialization process) 
for whom this model of personality 
and accompanying techniques of 
treatment are quite inappropriate 
(Bandura & Walters, 1959; Schmide- 
berg, 1959). Such antisocial person- 
alities are likely to present learning 
deficits, consequently the goal of 
therapy is the acquisition of second- 
ary motives and the development of 
internal restraint habits. That anti- 
social patients prove unresponsive to 
psychotherapeutic methods develop- 
ed for the treatment of oversocialized 
neurotics has been demonstrated ina 
number of studies comparing pa- 
tients who remain in treatment with 
those who terminate treatment pre- 
maturely (Rubenstein & Lorr, 1956). 
It is for this class of patients that the 
greatest departures from traditional 
treatment methods is needed. 

While counterconditioning, extinc- 
tion, and discrimination learning may 
be effective ways of removing neu- 
rotic inhibitions, these methods may 
be of relatively little value in develop- 
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ing new positive habits. Primary and 
secondary rewards in the form of the 
therapist’s interest and approval may 
play an important, if not indispensa- 
ble, role in the treatment process. 
Once the patient has learned to want 
the interest and approval of the 
therapist, these rewards may then be 
used to promote the acquisition of 
new patterns of behavior. For certain 
classes of patients such as schizo- 
phrenics (Atkinson, 1957; Peters, 
1953; Robinson, 1957) and delin- 
quents (Cairns, 1959), who are either 
unresponsive to, or fearful of, social 
rewards, the therapist may have to 
rely initially on primary rewards in 
the treatment process. 

An ingenious study by Peters and 
Jenkins (1954) illustrates the applica- 
tion of this principle in the treatment 
of schizophrenic patients. Chronic 
patients from closed wards were ad- 
ministered subshock injections of 
insulin designed to induce the hunger 
drive. The patients were then en- 


couraged to solve a series of graded 
problem tasks with fudge as the re- 


ward. This program was followed 5 
days a week for 3 months. 

Initially the tasks involved simple 
mazes and obstruction problems in 
which the patients obtained the food 
reward directly upon successful com- 
pletion of the problem. Tasks of 
gradually increasing difficulty were 
then administered involving mul- 
tiple-choice learning and verbal-rea- 
soning problems in which the experi- 
menter personally mediated the pri- 
mary rewards. After several weeks of 
such problem solving activities the 
insulin injections were discontinued 
and social rewards, which by this 
time had become more effective, were 
used in solving interpersonal prob- 
lems that the patients were likely to 
encounter in their daily activities 
both inside and outside the hospital 
setting. 
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Comparison of the treated group 
with control groups, designed to iso- 
late the effects of insulin and special 
attention, revealed that the patients 
in the reward group improved signifi- 
cantly in their social relationships in 
the hospital, whereas the patients in 
the control groups showed no such 
change. 

King and Armitage (1958) report a 
somewhat similar study in which 
severely withdrawn schizophrenic pa- 
tients were treated with operant 
conditioning methods; candy and 
cigarettes served as the primary re- 
wards for eliciting and maintaining 
increasingly complex forms of behav- 
ior, i.e., psychomotor, verbal, and 
interpersonal responses. Unlike the 
Peters and Jenkins study, no attempt 
was made to manipulate the level of 
primary motivation. 

An interesting feature of the ex- 
perimental design was the inclusion 
of a group of patients who were 
treated with conventional interview 
therapy, as well as a recreational 
therapy and a no-therapy control 
group. It was found that the operant 
group, in relation to similar patients 
in the three control groups, made 
significantly more clinical improve- 
ment. 

Skinner (1956b) and _ Lindsley 
(1956) working with adult psychotics, 
and Ferster (1959) working with 
autistic children, have been successful 
in developing substantial amounts of 
reality-oriented behavior in their 
patients through the use of reward. 
So far their work has been concerned 
primarily with the effect of schedules 
of reinforcement on the rate of evoca- 
tion of simple impersonal reactions. 
There is every indication, however, 
that by varying the contingency of 
the reward (e.g., the patient must 
respond in certain specified ways to 
the behavior of another individual in 
order to produce the reward) adap- 
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tive interpersonal behaviors can be 
developed as well (Azran & Lindsley, 
1956). 

The effectiveness of social rein- 
forcers in modifying behavior has 
been demonstrated repeatedly in 
verbal conditioning experiments 
(Krasner, 1958; Salzinger, 1959). 
Encouraged by these findings, several 
therapists have begun to experiment 
with operant conditioning as a meth- 
od of treatment in its own right 
(Tilton, 1956; Ullman, Krasner, & 
Collins, in press; R. I. Williams, 
1959); the operant conditioning stud- 
ies cited earlier are also illustrative of 
this trend. 

So far the study of generalization 
and permanence of behavior changes 
brought about through operant con- 
ditioning methods has received rela- 
tively little attention and the scanty 
data available are equivocal (Rogers, 
1960; Sarason, 1957; Weide, 1959). 
The lack of consistency in results is 
hardly surprising considering that 
the experimental manipulations in 
many of the conditioning studies are 
barely sufficient to demonstrate con- 
ditioning effects, let alone generaliza- 
tion of changes to new situations. On 
the other hand, investigators who 
have conducted more intensive rein- 
forcement sessions, in an effort to test 
the efficacy of operant conditioning 
methods as a therapeutic technique, 
have found significant changes in pa- 
tients’ interpersonal behavior in ex- 
tra-experimental situations (King & 
Armitage, 1958; Peters & Jenkins, 
1954; Ullman et al., in press). These 
findings are particularly noteworthy 
since the response classes involved 
are similar to those psychotherapists 
are primarily concerned in modifying 
through interview forms of treatment. 
If the favorable results yielded by 
these studies are replicated in future 
investigations, it is likely that the 
next few years will witness an increas- 
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ing reliance on conditioning forms of 
psychotherapy, particularly in the 
treatment of psychotic patients. 

At this point it might also be noted 
that, consistent with the results from 
verbal conditioning experiments, con- 
tent analyses of psychotherapeutic 
interviews (Bandura, Lipsher, & 
Miller, 1960; Murray, 1956) suggest 
that many of the changes observed in 
psychotherapy, at least insofar as the 
patients’ verbal behavior is con- 
cerned, can be accounted for in terms 
of the therapists’ direct, although 
usually unwitting, reward and pun- 
ishment of the patients’ expressions. 


PUNISHMENT 


While positive habits can be read- 
ily developed through reward, the 
elimination of socially disapproved 
habits, which becomes very much an 
issue in the treatment of antisocial 
personalities, poses a far more com- 
plex problem. 

The elimination of socially disap- 
proved behaviors can be accom- 
plished in several ways. They may 
be consistently unrewarded and thus 
extinguished. However, antisocial 
behavior, particularly of an extreme 
form, cannot simply be ignored in 
the hope that it will gradually extin- 
guish. Furthermore, since the suc- 
cessful execution of antisocial acts 
may bring substantial material re- 
wards as well as the approval and 
admiration of associates, it is ex- 
tremely unlikely that such behavior 
would ever extinguish. 

Although punishment may lead to 
the rapid disappearance of socially 
disapproved behavior, its effects are 
far more complex (Estes, 1944; 
Solomon, Kamin, & Wynne, 1953). 
If a person is punished for some so- 
cially disapproved habit, the impulse 
to perform the act becomes, through 
its association with punishment, a 
stimulus for anxiety. This anxiety 
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then motivates competing responses 
which, if sufficiently strong, prevent 
the occurrence of, or inhibit, the dis- 
approved behavior. Inhibited re- 
sponses may not, however, thereby 
lose their strength, and may reappear 
in situations where the threat of 
punishment is weaker. Punishment 
may,-in fact, prevent the extinction 
of a habit; if a habit is completely 
inhibited, it cannot occur and there- 
fore cannot go unrewarded. 

Several other factors point to the 
futility of punishment as a means of 
correcting many antisocial patterns. 
The threat of punishment is very 
likely to elicit conformity; indeed, 
the patient may obligingly do what- 
ever he is told to do in order to avoid 
immediate difficulties. This does not 
mean, however, that he has acquired 
a set of sanctions that will be of 
service to him once he is outside the 
treatment situation. In fact, rather 
than leading to the development of 
internal controls, such methods are 


likely only to increase the patient’s 
reliance on external restraints. More- 


over, under these conditions, the 
majority of patients will develop the 
attitude that they will do only what 
they are told to do—and then often 
only half-heartedly—and that they 
will do as they please once they are 
free from the therapist’s supervision 
(Bandura & Walters, 1959). 

In addition, punishment may serve 
only to intensify hostility and other 
negative motivations and thus may 
further instigate the antisocial person 
to display the very behaviors that 
the punishment was intended to 
bring under control. 

Mild aversive stimuli have been 
utilized, of course, in the treatment 
of voluntary patients who express a 
desire to rid themselves of specific 
debilitating conditions. 

Liversedge and Sylvester (1955), 
for example, successfully treated 
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seven cases of writer’s cramp by 
means of a retraining procedure in- 
volving electric shock. In order to 
remove tremors, one component of 
the motor disorder, the patients were 
required to insert a stylus into a series 
of progressively smaller holes; each 
time the stylus made contact with 
the side of the hole the patients re- 
ceived a mild shock. The removal of 
the spasm component of the disorder 
was obtained in two ways. First, the 
patients traced various line patterns 
(similar to the movements required 
in writing) on a metal plate with a 
stylus, and any deviation from the 
path produced a shock. Following 
training on the apparatus, the sub- 
jects then wrote with an electrified 
pen which delivered a shock when- 
ever excessive thumb pressure was 
applied. 

Liversedge and Sylvester report 
that following the retraining the pa- 
tients were able to resume work; a 
follow-up several months later indi- 
cated that the improvement was 
being maintained. 

The aversive forms of therapy, de- 
scribed earlier in the section on 
counterconditioning procedures, also 
make use of mild punishment. 


SOcIAL IMITATION 


Although a certain amount of 
learning takes place through direct 
training and reward, a good deal of a 
person’s behavior repertoire may be 
acquired through imitation of what 
he observes in others. If this is the 
case, social imitation may serve as an 
effective vehicle for the transmission 
of prosocial behavior patterns in the 
treatment of antisocial patients. 

Merely providing a model for imi- 
tation is not, however, sufficient. 
Even though the therapist exhibits 
the kinds of behaviors that he wants 
the patient to learn, this is likely to 
have little influence on him if he 
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rejects the therapist as a model. 
Affectional nurturance is believed to 
be an important precondition for 
imitative learning to occur, in that 
affectional rewards increase the sec- 
ondary reinforcing properties of the 
model, and thus predispose the imi- 
tator to pattern his behavior after 
the rewarding person (Mowrer, 1950; 
Sears, 1957; Whiting, 1954). Some 
positive evidence for the influence of 
social rewards on imitation is pro- 
vided by Bandura and Huston (in 
press) in a recent study of identifica- 
tion as a process of incidental imita- 
tion. 

In this investigation preschool chil- 
dren performed an orienting task 
but, unlike most incidental learning 
studies, the experimenter performed 
the diverting task as well, and the 
extent to which the subjects pat- 
terned their behavior after that of the 
experimenter-model was measured. 

A two-choice discrimination prob- 
lem similar to the one employed by 
Miller and Dollard (1941) in their 
experiments of social imitation was 
used as the diverting task. On each 
trial, one of two boxes was loaded 
with two rewards (small multicolor 
pictures of animals) and the object 
of the game was to guess which box 
contained the stickers. The experi- 
menter-model (M) always had her 
turn first and in each instance chose 
the reward box. During M’s trial, 
the subject remained at the starting 
point where he could observe the 
M’s behavior. On each discrimina- 
tion trial M exhibited certain verbal, 
motor, and aggressive patterns of 
behavior that were totally irrelevant 
to the task to which the subject’s at- 
tention was directed. At the starting 
point, for example, M made a verbal 
response and then marched slowly 
toward the box containing the stick- 
ers, repeating, ‘‘March, march, 
march.”’ On the lid of each box was a 
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rubber doll which M knocked off 
aggressively when she reached the 
designated box. She then paused 
briefly, remarked, ‘‘Open the box,” 
removed one sticker, and pasted it on 
a pastoral scene which hung on the 
wall immediately behind the boxes. 
The subject then took his turn and 
the number of M’s behaviors per- 
formed by the subject was recorded. 

A control group was included in 
order to, (a) provide a check on 
whether the subjects’ performances 
reflected genuine imitative learning 
or merely the chance occurrence of 
behaviors high in the subjects’ re- 
sponse hierarchies, and (b) to deter- 
mine whether subjects would adopt 
certain aspects of M’s behavior which 
involved considerable delay in re- 
ward. With the controls, therefore, 
M walked to the box, choosing a 
highly circuitous route along the sides 
of the experimental room; instead of 
aggressing toward the doll, she lifted 
it gently off the container. 

The results of this study indicate 
that, insofar as preschool children 
are concerened, a good deal of inci- 
dental imitation of the behaviors dis- 
played by an adult model does occur. 
Of the subjects in the experimental 
group, 88% adopted the M’s aggres- 
sive behavior, 44% imitated the 
marching, and 28% reproduced M’s 
verbalizations. In contrast, none of 
the control subjects behaved aggres- 
sively, marched, or verbalized, while 
75% of the controls imitated the 
circuitous route to the containers. 

In order to test the hypothesis that 
children who experience a rewarding 
relationship with an adult model 
adopt more of the model’s behavior 
than do children who experience a 
relatively distant and cold relation- 
ship, half the subjects in the experi- 
ment were assigned to a nurturant 
condition; the other half of the sub- 
jects to a nonnurturant condition. 
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During the nurturant sessions, which 
preceded the incidental learning, M 
played with subject, she responded 
readily to the subject’s bids for atten- 
tion, and in other ways fostered a 
consistently warm and _ rewarding 
interaction with the child. In con- 
trast, during the nonnurturant ses- 
sions, the subject played alone while 
M busied herself with paperwork at a 
desk in the far corner of the room. 

Consistent with the hypothesis, it 
was found that subjects who experi- 
enced the rewarding interaction with 
M adopted significantly more of M’s 
behavior than did subjects who were 
in the nonnurturance condition. 

A more crucial test of the transmis- 
sion of behavior patterns through the 
process of social imitation involves 
the delayed generalization of imita- 
tive responses to new situations in 
which the model is absent. A study 
of this type just completed, provides 
strong evidence that observation of 
the cues produced by the behavior of 
others is an effective means of elicit- 
ing responses for which the original 
probability is very low (Bandura, 
Ross, & Ross, in press). 

Empirical studies of the correlates 
of strong and weak identification 
with parents, lend additional support 
to the theory that rewards promote 
imitative learning. Boys whose 
fathers are highly rewarding and 
affectionate have been found to adopt 
the father-role in doll-play activities 
(Sears, 1953), to show father-son 
similarity in response to items on a 
personality questionnaire (Payne & 
Mussen, 1956), and to display mascu- 
line behaviors (Mussen & Distler, 
1956, 1960) to a greater extent than 
boys whose fathers are relatively cold 
and nonrewarding. 

The treatment of older unsocialized 
delinquents is a difficult task, since 
they are relatively self-sufficient and 
do not readily seek involvement with 
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a therapist. In many cases, socializa- 
tion can be accomplished only 
through residental care and treat- 
ment. In the treatment home, the 
therapist can personally administer 
many of the primary rewards and 
mediate between the boys’ needs and 
gratifications. Through the repeated 
association with rewarding experi- 
ences for the boy, many of the thera- 
pist’s attitudes and actions will 
acquire secondary reward value, and 
thus the patient will be motivated to 
reproduce these attitudes and actions 
in himself. Once these attitudes and 
values have been thus accepted, the 
boy’s inhibition of antisocial tenden- 
cies will function independently of 
the therapist. 

While treatment through social 
imitation has been suggested as a 


method for modifying antisocial pat- 


terns, it can be an effective procedure 
for the treatment of other forms of 
disorders as well. Jones (1924a), for 
example, found that the social ex- 
ample of children reacting normally 
to stimuli feared by another child was 
effective, in some instances, in elimi- 
nating such phobic reactions. In 
fact, next to counterconditioning, the 
method of social imitation proved to 
be most effective in eliminating inap- 
propriate fears. 

There is some suggestive evidence 
that by providing high prestige 
models and thus increasing the rein- 
forcement value of the imitatee’s 
the effectiveness of this 
method in promoting favorable ad- 
justive patterns of behavior may be 
further increased (Jones, 1924a; 
Mausner, 1953, 1954; Miller & Dol- 
lard, 1941). 

During the course of conventional 
psychotherapy, the patient is exposed 
to many incidental cues involving 
the therapist’s values, attitudes, and 
patterns of behavior. They are inci- 
dental only because they are usually 


behavior, 
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considered secondary or irrelevant to 
the task of resolving the patient’s 
problems. Nevertheless, some of the 
changes observed in the patient’s be- 
havior may result, not so much from 
the intentional interaction between 
the patient and the therapist, but 
rather from active learning by the 
patient of the therapist’s attitudes 
and values which the therapist never 
directly attempted to transmit. This 
is partially corroborated by Rosen- 
thal (1955) who found that, in spite 
of the usual precautions taken by 
therapists to avoid imposing their 
values on their clients, the patients 
who were judged as showing the 
greatest improvement changed their 
moral values (in the areas of sex, 
aggression, and authority) in the di- 
rection of the values of their thera- 
pists, whereas patients who were un- 
improved became less like the thera- 
pist in values. 


Factors IMPEDING INTEGRATION 


In reviewing the literature on psy- 
chotherapy, it becomes clearly evi- 
dent that learning theory and general 
psychology have exerted a remark- 
ably minor influence on the practice 
of psychotherapy and, apart from the 
recent interest in Skinner’s operant 
’ conditioning methods (Krasner, 1955; 
Skinner, 1953), most of the recent 
serious attempts to apply learning 
principles to clinical practice have 
been made by European psychothera- 
pists (Jones, 1956; Lazarus & Rach- 
man, 1957; Liversedge & Sylvester, 
1955; Meyer, 1957; Rachman,-1959; 
Raymond, 1956; Wolpe, 1958; Yates, 
1958). This isolation of the methods 
of treatment from our knowledge of 
learning and motivation will continue 
to exist for some time since there are 
severa! prevalent attitudes that im- 
pede adequate integration. 

In the first place, the deliberate use 
of the principles of learning in the 
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modification of human behavior im- 
plies, for most psychotherapists, 
manipulation and control of the pa- 
tient, and control is seen by them as 
antihumanistic and, therefore, bad. 
Thus, advocates of a learning ap- 
proach to psychotherapy are often 
charged with treating human beings 
as though they were rats or pigeons 
and of leading on the road to Orwell's 
1984. 

This does not mean that psycho- 
therapists do not influence and con- 
trol their patients’ behavior. On the 
contrary. In any interpersonal inter- 
action, and psychotherapy is no ex- 
ception, people influence and control 
one another (Frank, 1959; Skinner, 
1956a). Although the patient’s con- 
trol of the therapist has not as yet 
been studied (such control is evident 
when patients subtly reward the 
therapist with interesting historical 
material and thereby avoid the dis- 
cussion of their current interpersonal 
problems), there is considerable evi- 
dence that the therapist exercises 
personal control over his patients. 
A brief examination of interview 
protocols of patients treated by thera- 
pists representing differing theoretical 
orientations, clearly reveals that the 
patients have been thoroughly condi- 
tioned in their therapists’ idiosyn- 
cratic languages. Client-centered 
patients, for example, tend to produce 
the client-centered terminology, the- 
ory, and goals, and their interview 
content shows little or no overlap 
with that of patients seen in psycho- 
analysis who, in turn, tend to speak 
the language of psychoanalytic the- 
ory (Heine, 1950). Even more direct 
evidence of the therapists’ controlling 
influence is provided in studies of 
patient-therapist interactions (Ban- 
dura et al., 1960; Murray, 1956; 
Rogers, 1960). The results of these 
studies show that the therapist not 
only controls the patient by reward- 
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ing him with interest and approval 
when the patient behaves in a fashion 
the therapist desires, but that he also 
controls through punishment, in the 
form of mild disapproval and with- 
drawal of interest, when the patient 
behaves in ways that are threatening 
to the therapist or run counter to his 
goals. 

One difficulty in understanding 
the changes that occur in the course 
of psychotherapy is that the inde- 
pendent variable, i.e., the therapist’s 
behavior, is often vaguely or only 
partially defined. In an effort to 
minimize or to deny the therapist’s 
directive influence on the patient, 
the therapist is typically depicted as 
a “‘catalyst’’ who, in some mysterious 
way, sets free positive adjustive pat- 
terns of behavior or similar outcomes 
usually described in very general and 
highly socially desirable terms. 

It has been suggested, in the ma- 
terial presented in the preceding 
sections, that many of the changes 


that occur in psychotherapy derive 
from the unwitting application of 


well-known principles of learning. 
However, the occurrence of the neces- 
sary conditions for learning is more by 
accident than by intent and, per- 
haps, a more deliberate application of 
our knowledge of the learning process 
to psychotherapy would yield far 
more effective results. 

The predominant approach in the 
development of psychotherapeutic 
procedures has been the “school” 
approach. A similar trend is noted in 
the treatment methods being derived 
from learning theory. Wolpe, for 
example, has selected the principle of 
counterconditioning and built a 
“‘school”’ of psychotherapy around it; 
Dollard and Miller have focused on 
extinction and discrimination learn- 
ing; and the followers of Skinner rely 
almost entirely on methods of re- 
ward. This stress on a few learning 
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principles at the expense of neglect- 
ing other relevant ones will serve 
only to limit the effectiveness of 
psychotherapy. 

A second factor that may account 
for the discontinuity between general 
psychology and psychotherapeutic 
practice is that the model of person- 
ality to which most therapists sub- 
scribe is somewhat dissonant with 
the currently developing principles 
of behavior. 

In their formulations of personal- 
ity functioning, psychotherapists are 
inclined to appeal to a variety of 
inner explanatory processes. In con- 
trast, learning theorists view the 
organism as a far more mechanistic 
and simpler system, and consequently 
their formulations tend to be ex- 
pressed for the most part in terms of 
antecedent-consequent relationships 
without reference to inner states. 


Symptoms are learned S-R connections; 
once they are extinguished or deconditioned 
treatment is complete. Such treatment is 
based exclusively on present factors; like 
Lewin's theory, this one is a-historical. Non- 
verbal methods are favored over verbal ones, 
although a minor place is reserved for verbal 
methods of extinction and reconditioning. 
Concern is with function, not with content. 
The main difference between the two theories 
arises over the question of “symptomatic” 
treatment. According to orthodox theory, 
this is useless unless the underlying complexes 
are attacked. According to the present 
theory, there is no evidence for these putative 
complexes, and symptomatic treatment is all 
that is required (Eysenck, 1957, pp. 267-268). 
(Quoted by permission of Frederick A. Praeger, 
Inc.) 


Changes in behavior brought about 
through such methods as counter- 
conditioning are apt to be viewed by 
the ‘“‘dynamically-oriented”’ thera- 
pist, as being not only superficial, 
“symptomatic” treatment, in that 
the basic underlying instigators of 
the behavior remain unchanged, but 
also potentially dangerous, since the 
direct elimination of a symptom may 
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precipitate more seriously disturbed 
behavior. 

This expectation receives little 
support from the generally favorable 
outcomes reported in the studies re- 
viewed in this paper. In most cases 
where follow-up data were available 
to assess the long-term effects of the 
therapy, the patients, many of whom 
had been treated by conventional 
methods with little benefit, had evi- 
dently become considerably more 
effective in their social, vocational, 
and psychosexual adjustment. On 
the whole the evidence, while open to 
error, suggests that no matter what 
the origin of the maladaptive behav- 
ior may be, a change in behavior 
brought about through learning pro- 
cedures may be all that is necessary 
for the alleviation of most forms of 
emotional disorders. 

As Mowrer (1950) very aptly 
points out, the “symptom-underly- 
ing cause’ formulation may repre- 
sent inappropriate medical analogiz- 


ing. Whether or not a given behavior 
will be considered normal or a sym- 
tom of an underlying disturbance will 
depend on whether or not somebody 


objects to the behavior. For exam- 
ple, aggressiveness on the part of 
children may be encouraged and con- 
sidered a sign of healthy develop- 
ment by the parents, while the same 
behavior is viewed by school au- 
thorities and society as a symptom of 
a personality disorder (Bandura & 
Walters, 1959). Furthermore, be- 
havior considered to be normal at one 
stage in development may be re- 
regarded as a “symptom of a per- 
sonality disturbance”’ at a later pe- 
riod. In this connection it is very 
appropriate torepeat Mowrer’s (1950) 
query: “And when does persisting 
behavior of this kind suddenly cease 
to be normal and become a symp- 
tom” (p. 474). 

Thus, while a high fever is generally 
considered a sign of an underlying 
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disease process regardless of when or 
where it occurs, whether a specific be- 
havior will be viewed as normal or as 
a symptom of an underlying pathol- 
ogy is not independent of who makes 
the judgement, the social context in 
which the behavior occurs, the age of 
the person, as well as many other fac- 
tors. 

Another important difference be- 
tween physical pathology and _ be- 
havior pathology usually overlooked 
is that, in the case of most behavior 
disorders, it is not the underlying 
motivations that need to be altered 
or removed, but rather the ways in 
which the patient has learned to 
gratify his needs (Rotter, 1954). 
Thus, for example, if a patient dis- 
plays deviant sexual behavior, the 
goal is not the removal of the under- 
lying causes, i.e., sexual motivation, 
but rather the substitution of more 
socially approved instrumental and 
goal responses. 

It might also be mentioned in 
passing, that, in the currently popu- 
lar forms of psychotherapy, the role 
assumed by the therapist may bring 
him a good many direct or fantasied 
personal gratifications. In the course 
of treatment the patient may express 
considerable affection and admiration 
for the therapist, he may assign the 
therapist an omniscient status, and 
the reconstruction of the patient’s 
history may be an_ intellectually 
stimulating activity. On the other 
hand, the methods derived from 
learning theory place the therapist in 
a less glamorous role, and this in it- 
self may create some reluctance on 
the part of psychotherapists to part 
with the procedures currently in use. 

Which of the two conceptual theo- 
ries of personality—the psychody- 
namic or the social learning theory— 
is the more useful in generating effec- 
tive procedures for the modification 
of human behavior remains to be 
demonstrated. While it is possible to 
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present logical arguments and im- 
pressive clinical evidence for the ef- 
ficiency of either approach, the best 
proving ground is the laboratory. 

In evaluating psychotherapeutic 
methods, the common practice is to 
compare changes in a treated group 
with those of a nontreated control 
group. One drawback of this ap- 
proach is that, while it answers the 
question as to whether or not a par- 
ticular treatment is more effective 


than no intervention in producing 
changes along specific dimensions for 
certain classes of patients, it does not 
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Exploration of personality by mul- 
tivariate experimental methods, as a 
means of objectively determining 
personality structure, has revealed, 
on the one hand, an array of stable, 
meaningful, cross-checking structures 
(Cattell, 1946, 1957; French, 1953), 
and on the other, some baffling in- 
consistencies. The latter have re- 
cently been pointed out by Becker 
(1960), apparently in criticism of the 
present writer’s perswnality theory, 
but have been known for several 
years, and were, in fact, first brought 
to light by Cattell and Saunders 
(1950). Nevertheless, Becker does a 
service to advertise these facts; for 
psychologists have greatly neglected 
the solution of the problems revealed 
in this field. 

The present writer’s theoretical 
position is that it is conceptually cor- 
rect to speak of the same unique 
source trait, e.g., cyclothymia-schizo- 
thymia, anxiety, ego-strength, sur- 
gency-desurgency, as something ex- 
pressing itself (in terms of recogniz- 
able, replicable factor patterns) 
across all three possible media of 
experimental observation. That is to 
say, the same influence should appear 
in L data (life record, behavior in 
situ), Q data (questionnaire, consult- 
ing room, verbal self-evaluation), and 
T data (objective, laboratory, minia- 
ture situational, non-self-evaluative 
test performances). 

In the article (Becker, 1960) to 
which I reply the fact that the actual 
correlation between the L-data and 
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Q-data estimates of what are ap- 
parently equivalents in the two 
media, sometimes falls far short of 
perfection, is accepted as disproof of 
this theory. This theoretical con- 
clusion is unsubtle; and the thesis of 
my reply is that countless threads of 
evidence contribute to the view that 
the same abstract personality source 
trait commonly operates across differ- 
ent media. However, certain “per- 
turbations”” have to be recognized 
Which prevent the simple relation 
appearing on the surface, and these 
need to be taken into account in un- 
derstanding psychological measure- 
ment generally. 

In this area of scientific investiga- 
tion, Becker has not asked the right 
question. Unexpected, but system- 
atically evaluated perturbations of 
existing laws have often led to new 
discoveries, not so much by rejecting 
a law as by extending it, e.g., in 
astronomy in the discovery of Nep- 
tune through observed perturbations 
in the expected orbit of Uranus. So 
here, it is argued that there is no 
reason to abandon the notion of uni- 
tary source traits (Cattell, 1946) but 
that one must recognize certain new 
concepts, which we have introduced 
under the terms situational, instru- 
ment, and refraction factors. These 
are supported partly by marshaling 
existing evidence, but also by experi- 
ments undertaken ad hoc, but which, 
through an editorial veto on space to 
reply, have been reported in a sepa- 
rate publication (Cattell, 1960). 
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THE DEFINITION OF INSTRUMENT 
FACTORS 

The first and major source of 
perturbation in transmedia factor 
matching arises from what may be 
called instrument factors. Apparently, 
the first explicit recognition and 
demonstration of an instrument fac- 
tor occurred in a structural analysis 
of a very widely selected set of objec- 
tive personality tests, by Cattell and 
Gruen (1955), where a factor ap- 
peared literally produced by diurnal 
variations of sensitivity of a brass 
instrument (GSR). This purely in- 
strumental influence created a fac- 
tor by throwing common variance 
into all types of personality measures 
in which it was used. Such factors 
have appeared since in publications 
by Holzmann and Bitterman (1956), 
F. L. Damarin, D. T. Campbell, and 
L. Berwyn (unpublished), and several 
other unpublished studies known to 
the writer. Indeed, wherever ques- 
tionnaire variables are mixed with 
ratings, attitude scales with question- 
naires, or, sometimes, even one type 
of answer form with another, one or 
more factors may generally be found 
covering all variables having formal 
similarity. 

The difficulty factors of Wherry 
and Gaylord (1944), and Dingman 
(1958), should definitely be regarded 
as a subspecies of instrument factor. 
Recently, in a study of the Music 
Preference Test of Personality (Cat- 
tell & Anderson, 1953) by Mayeske 
(1961) an instrument factor appeared 
even separating all items resting on 
one form of musical recording from 
those based on another technique. 
Instrument factors have become bet- 
ter understood in the last couple of 
years through extensive studies of 
their appearance in objective moti- 
vation structure analyses (Cattell, 
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Radcliffe, & Sweney, 1960; R. B. 
Cattell & J. Horn, unpublished). 
There they appear as ‘vehicle fac- 
tors’’ covering all objective devices 
using the same vehicle, e.g., informa- 
tion, autism, for the objective meas- 
urement of motivation strength. In 
this, and many similar contexts, it 
has been shown that instrument 
factors can be fairly clearly elimi- 
nated by ipsative scoring (R. B. 
Cattell & J. Horn, unpublished, see 
Table 1). 

Before proceeding beyond this 
introduction by illustrations, to a 
more comprehensive definition of the 
concept of instrument factor, it is 
desirable, however, to make clear 
which peripheral factors are not to be 
included. This be done most 
compactly by Figure 1, presenting a 
hierarchy which will be clear to 
multivariate experimentalists. Inci- 
dentally, the term “artifactors’’ is 
due to Roberts (1959), and has been 
sharpened by additional conditions 
here to make their separation from 
instrument factors cleaner. 


can 


The justification for the labels of 
the three forms of “‘perturbing”’ fac- 
tors reproducible across experiments 
(matrices) will be given as we pro- 
ceed. Concentrating first on instru- 
ment factors, let us note that they 
are definable, initially, only in terms 
of intention and perspective. Later, 
the definiticn can be made more 
satisfactory as we develop precise 
concepts indicating various universes 
of variables. For a quality which 
persists across the differences of con- 
tent of a series of opinionnaires of 
similar form, and which perhaps con- 
sists of response to a particular form 
inherent in this instrument, though 
irrelevant to the content interest of 
the experimenter may yet represent 
behavior dependent on a real per- 
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Factors from Psychological Measures 


fable net 


Matrix (Experiment) Specific Factors 


Error Factors Artifactors 


Computation 
Score-Dependence 
Factors 


—. 


Factors Reproducible across Matrices (Experiments) 


General Behavior 
Predictive Factors 


Perturbing Factors 


\ 
Form-Specific 
Instrument 
Factors 


Observer Perception 
and Projection 
Factors 


Common Response- 
Observation Form 
Factors 


Common General 
Stimulus Situation 
Factors 


Fic. 1. The place of instrument factors in a taxonomy of factors. 


sonality trait. For example, what 
comes as an instrument factor cover- 
ing the variables of similar form, 
ai...@n, may well load (when 
a@1...@, are condensed to a single 
variable a, set in the new context of 
variables b, c, d, etc.) some important 
general personality factor.' 

There is thus a sense in which an 


instrument factor is a matter of per- 
spective, i.e., of one’s starting point 


1 Incidentally, it is the failure to recognize 
this perspective which, in the present writer’s 
opinion, has made so much recent work on 
response sets a rather uneconomical use of 
psychological research time. Whereas educa- 
tional psychometrists during the late 1950s 
“‘discovered,”” in their opionnaire tests, re- 
sponse sets (Cronbach, 1950), social desira- 
bility sets (Edwards, 1957), extremity of 
response sets (Berg, 1955), and acquiscence— 
tendency to agree, yes-vs.-no (Messick & 
Jackson, 1961)—these had already been em- 
ployed by designers of objective personality 
tests in the late 1940s and early 1950s 
(Cattell, 1946; Cattell & Gruen, 1955). In the 
context of broader personality theories, and 
varied behavioral measures involved, it had 
already become clear that what itemetrists, 
without knowledge of the literature in this 
area, later treated merely as ‘“‘flaws” in their 
paper-and-pencil tests, were actually expres- 
sions of well defined personality factors, e. g., 
anxiety or UI 24, comention or UI 20, super- 
ego rigidity or UI 29, as well as UI 31 (Cat- 
tell, 1957). 


and of the plane of experience from 
which one chooses the majority of 
one’s tests. In this sense, just as dirt 
is only ‘‘matter in the wrong place,” 
so an instrument factor is only “‘vari- 
ance where we didn’t expect it or 
don’t want it.’” When we are measur- 
ing personality by questionnaire we 
obviously do not want each and all of 
the diverse personality dimensions 
included to be contaminated by what 
might be called a “generalized spe- 
cific,’’ i.e., a specific to questionnaires. 
And the fact that that specific may, 
indeed, be something more than a 
trivial specific, but an expression of a 
single important personality factor 
spread over and contaminating all 
the alleged diverse personality meas- 
ures, does not make the measurement 
harsh any more acceptable! 

When more progress has been made 
toward a systematic taxonomy of 
tests, on some such objective basis as 
that worked out by Cattell and 
Warburton (in press), it would be- 
come possible to set up also a rela- 
tively objective classification of in- 
strument factors, according to the 
types of personality approach to 
which they are tangential. For 
“form” and “content” are quite 
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subjective categories, and, in any 
case, by no means exhaust the pos- 
sible planes of experiment to which 
instrument factors can be orthogon- 
ally intrusive. For the time being, 
however, we must take a relativistic 
position, and one centered in “con- 
tent.”” On this basis we shall con- 
tingently define an instrument factor 
as any uniquely (simple structure) 
rotated factor which covers a whole 
set of diverse variables having formal 
resemblance in presentation, mode of 
permitted response, or scoring, and 
which does not extend to tests of the 
same psychological content when 
couched in other modes of formal 
presentation, response, etc. 


THEORY OF SOURCES OF PERTUR- 
BATION AFFECTING TRAIT 
ALIGNMENT 


It should be noted that there are 
two distinct, though related senses 
in which a source trait can be said to 
be the same or not the same in two 
different media: 

1. An estimate of the factor from 
the variables in one medium may 
correlate less than unity with its 
estimate from variables in another 
medium, even when attenuation-cor- 
rected for (a) unreliability of meas- 
urement, and (b) imperfection of 
estimate. 

2. It may not be possible to dis- 
cover a trait, when factoring both 
media together, which has simple 
structure across both media and also 
possession of the hypothesized, simi- 
lar-meaning salient loadings in both 
media. (Whether one also means 
that the simple structure position in 
one medium will not project into the 
other we shall discuss below.) 

Becker has been concerned with 
the first of these, denying alignment 
without first checking that Correc- 
tions a and 6 could not restore the 
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correlation to unity. In any case, the 
second meaning is more important. 
If unity in this sense holds, personal- 
ity theory is profoundly simplified, 
and it is only a matter of the mechan- 
ics of statistics to produce weighted 
measures from the two media that 
will approach a correlation of unity.” 

In the larger collation of data and 
new experimental work (Cattell, 
1960), from which the present article 
abstracts, it has been shown that the 
presence of unrecognized instrument 
factors in the two media will prevent 
alignment either in Sense 1 or 2, un- 
less special new techniques are used. 
Before devoting a section to closer 
inspection of this result, however, it 
is desirable to set out a clear theory 
about more general sources of per- 
turbation. For, in principle, one can 
see that there are some six possible 
origins of the failure to find a one-to- 
one alignment of primary personality 
factors measured in one medium with 
those measured in another. Some of 
these will produce instrument factors; 
others will contribute to other kinds 
of nonalignment to be described. 


Sources of Nonalignment 


transmission 
projection, 


Human 
evaluation, 


(perception, 
memory) of 


score values. Largely this means rating 
and self-rating (L and Q data). This 
is too subtle and complex a field— 
hitherto handled too simply in terms 


2 Such a procedure should be sharply dis- 
tinguished from what Becker (1960) appears 
to advocate, and describes Gough as doing, 
namely, to force a Q scale to align itself with 
an L factor by assiduous item selection. Any 
such procedure contributes nothing to our 
knowledge of structure, but only hides the 
problem. If it succeeds, and if our theory is 
correct that L-data factors are the most 
heavily contaminated of any medium with 
irrelevant factors, this is forcing a poorly 
oriented measure to agree with a still poorer 
one. 
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of “halo’’—for the present abstract 
summary to be illustrated in available 
space (see Cattell, 1960). Theoreti- 
cally, the pattern of correlations, and 
therefore of obtained factors, could be 
distorted by, and only by, properties 
of the individual and his relation to 
the recorder which affect the record- 
ing of all his behavior variables, and 
by properties of the perceiving re- 
corder. The former can be divided 
into (a) value relationships, of which 
liking-disliking (a constituent in halo) 
is only one; and (6) perspicacity or 
visibility effects, e.g., extraversion 
making the ratee more known, posi- 
tion effects making certain behaviors 
more clear. The latter can be divided 
into projections of (a) stereotypes or 
cultural clichés,’ and (b) refraction 
factors, discussed below, peculiar to 
one medium. In all the “perceiving 
recorder effects’ a correlation is pro- 
duced by “projection’’ of a (perhaps 
quite unconscious) conviction that 
certain variables go together. Some 
of these may produce typical instru- 
ment factors, uniformly and about 
equally loading all variables in the 
medium; but others may load only 
some variables, producing what are 
perhaps best described as ‘‘percep- 
tion-evaluation”’ projection factors, 
and which are not true instrument 
factors. 

Communality of variables inirespect 
to some trait required for handling a 
similar formal performance in all of 
them (or for registering in an observa- 
tion situation). This is essentially 
one of the two main sources (see fol- 
lowing paragraph) of instrument 
factors only. The countless possibili- 


% Since sociologists have ruined “‘stereo- 
type,” by applying it equally to a widespread 
concept which either (a) does or (b) does not, 
correspond to statistical reality, I suggest 
“cultural cliché” explicitly for a widespread 
cultural concept which is significantly differ- 
ent from any externally existing pattern. 
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ties may be illustrated by e.g. the use 
of 30 scales in all of which the score 
(in one direction or the other) de- 
pends on an ability to read, or on in- 
formation or skill of expression, or 
tendency to say yes rather than no, 
etc. 

Communality of variables in respect 
to scoring or scaling applied after ad- 
ministration. Quite apart from com- 
mon demands on the subject’s actual 
performance as in the previous para- 
graph, anything in the formal scoring 
procedure which tends to give similar 
sigmas, and skewedness (and in some 
cases means) throughout one class of 
tests will tend to create higher corre- 
lation among them and a common 
factor. That is to say, if the matrix 
of correlations of tests a; through a, 
were just the same, on a rank formula, 
as that for b; through b,, but if all 
the a’s, on the one hand, and all 
b’s, on the other, have similar dis- 
tribution, then basing the matrix 
afresh on a product-moment formula 
will tend to give an instrument factor 
for the a’s and/or the b’s separately. 

Coincidence of different global stimu- 
lus situations with different test ‘media 
administrations. If a person answered 
one set of questionnaires in private, 
and another orally and_ publicly 
(which is akin to the interview or be- 
havior rating situation), we should 
expect real differences in response 
due to the actual stimulus situation, 
covering the occasion on which all 
items of one test were answered, be- 
ing different from that covering the 
other test-taking setting. A priori 
this could create both an instrument 
factor, conterminous with each me- 
dium-situation, and also a change in 
loading of the same items on the 
same personality factors in the two 
situations. 

Habitual broad area differences in 
actual trait development and expres- 
sion. Among children, for example, 
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we should expect the particular be- 
havior variables representing, say, 
the dominance factor, to be expressed 
to different degrees in the home en- 
vironment and in the school environ- 
ment. This is analogous to the point 
in the above paragraph, except that 
the influence is expressly conceived 
not to lie in the temporary measure- 
ment situation itself, but in the pro- 
longed life situation, leading to real 
differences of actual habit strength, 
i.e., Of the trait itself. Factor ana- 
lytically, this might produce a home 
dominance factor and a school domi- 
nance factor, representing the relative 
impact of home and school, respec- 
tively, or alternatively, one factor 
modified by two other factors, each 
peculiar to one broad area. If the 


former proves to be more character- 
istic, then we can confidently predict 
that the two first-order factors will 
correlate highly and yield a single 
second-order dominance factor. Even 
if the former is true it would be pos- 


sible, in a rough factoring to perceive 
the structure as that of a home and 
school instrument factor (as in the sec- 
ond possibility) but psychologically, 
the interpretation, if the proper struc- 
ture is obtained, would now be differ- 
ent from an instrument factor effect. 
The area differences would then be 
interpreted as real structure differ- 
ences, and the concept of a single 
dominance trait would be discovered 
and justified only at the second-order 
factor level. 

Differences among media in density 
of representation of variables. If in 
sampling variables in the ability field 
an experimenter accidentally took 
one variable for each of Thurstone’s 
primary abilities and factored, he 
would obtain, straightaway, i.e., as 
a first-order factor, that general abil- 
ity factor which, in any “‘dense’”’ rep- 
resentation of variables, appears only 
as a second-order factor (Thurstone, 
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1938). This concept of density of 
variable representation has been de- 
veloped further elsewhere (Cattell, 
1957, pp. 808-817), but it is easy to 
see that if there were really large dif- 
ferences of density unrecognized be- 
tween media we should obtain no 
correlational alignment of the pri- 
maries in the two fields. Only on ex- 
ploring the second order would the 
possibility arise of discovering that 
a second order in one medium is the 
same as a first order in the other. 

Actually, as soon as systematic ex- 
ploration of second-order structure 
in questionnaires reached to six 
factors (Cattell, 1957; Cattell & 
Scheier, 1961; Cattell & Warburton, 
in press), it became evident that 
four second-order questionnaire fac- 
tors aligned with four first-order ob- 
jective test factors (UI 19, 20, 24, 
and 32); and in two of these, UI 24 
(anxiety) and UI 32 (extraversion), 
the agreement is perfect within small 
limits of experimental error. An in- 
stance from a different realm, but 
amounting to a correlation of only 
0.80 between the two media, exists in 
Tollefson’s demonstration (1961) that 
the second-order extraversion factor 
in the questionnaire is a first-order 
factor in the Humor Test of Person- 
ality. These alignments (from the ear- 
lier, 1954-1957, publications above) 
are not mentioned in Becker's article 
(1960), perhaps because his com- 
ments are all on L- and Q- (rather 
than T-) data alignments. But the 
findings are highly relevant as show- 
ing that there does exist a corner of 
the intermedia jigsaw puzzle which 
is beginning to fit in place. These 
five experimental instances alone are 
surely sufficient to encourage us in 
that rejection of nihilism which this 
article undertakes. 

To risk a prediction in the little 
explored field of ‘‘density,’’ one 
might judge that variables in Q data 
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will prove somewhat more “dense” 
than L data. But substantially, as 
the above evidence shows, one can 
conclude only that variables as com- 
monly chosen are much more dense 
in Q than T data. This is under- 
standable; e.g., in the T-data anxiety 
factor, we test startle response by a 
single cold pressor test (Cattell & 
Scheier, 1961) whereas in most anxi- 
ety questionnaires there are a dozen 
items asking in different ways how 
easily the person startles. Cronbach 
(1960), Comrey, and others who 
criticize low homogeneity when re- 
viewing factor scales, are perhaps un- 
wittingly driving their flocks toward 
the more serious danger of using 
personality scales heavily loaded in 
spurious “‘specific’’ variance of this 
latter kind, instead of watching that 
their scales deal with personality 
factors having broad psychological 
relevance and effectiveness. 

If the above search for sources of 
perturbation has been truly exhaus- 
tive, our summary must include 
three other forms of distortion be- 
sides instrument factors, constituting 
four in all, as follows (beginning with 
instrument factors) : 

1. Test instrument factors, includ- 
ing common test form (response- 
observation-score) factors, and com- 
mon test general stimulus situation 
factors. 

2. Modification of actual trait by 
influences peculiar to one area of ex- 
pression, producing primaries for 
each area and requiring conceptual 
unity to be sought at a higher order 
level. 

3. Difference of density of repre- 
sentation of variables, as commonly 
unconsciously chosen by experimen- 
ters, in their different media, result- 
ing in a higher order in one medium 
matching a lower order in another. 

4. Perception-evaluation or pro- 
jection factors, which trespass on the 
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variance of the variables used to esti- 
mate personality factors, mot by uni- 
formly loading all in one medium (as 
does an instrument factor) but hav- 
ing each a characteristic form, and, 
when restricted to one medium, hav- 
ing the properties of refraction fac- 
tors described below. 


THE PRACTICAL PROBLEM OF REACH- 
ING PERSONALITY STRUCTURE 
DEsPITE DISTORTIONS 


If the above theoretical analysis is 
correct the manifest correlational 
picture of personality structures will 
be less like Whistler’s portrait of his 
mother than the cubist’s rendering 
of the same, fractured into surpris- 
ing new supernumerary planes and 
facets. To translate from the latter 
to the former, it is necessary that 
research, first, check the hypotheses 
about the forms of distortion at work 
and, second, find experimental and 
statistical means for isolating and 
setting aside these various perturbing 
influences. 

One cannot do more than glance 
at these tasks here. As to the first, 
our initial examination of data shows 
definitely that form-specific instru- 
ment factors exist, while my col- 
leagues and I have also begun to 
give evidence for the Sources 2, 3, 
and 4. The source of nonalignment 
labeled 2—local area modification of 
real traits—has been more fully 
illustrated elsewhere (Cattell, 1960) 
but must be left to others systemati- 
cally to investigate. Source 3, 
changing density with changing me- 
dium, has already been substanti- 
ated. 

As to the second task—segregating 
the distorting influences to arrive at 
essential structure—the unraveling 
of Effects 2 and 3 above is straight- 
forward, by second-order factoring, 
though the possibility has been 
mooted above that Source 2 could 
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produce two instrument factors, be- 
yond a single first-order factor, in- 
stead of two first orders resolving 
into a second. 

Setting 2 and 3 aside, therefore, 
we shall devote the present section to 
unraveling the effect of instrument 
factors, 1 above, and the following 
section to perception-evaluation-proj- 
ect phenomena, 4 above. 

The special experiments with in- 
strument factors described elsewhere 
(Cattell, 1960) proceeded first to find 
what happens when one factors cor- 
relation matrices derived from known, 
numerically stated factor models, 
and secondly, to experiment with 
varieties of solution in actual psy- 
chological data where the existence 
and boundaries of an instrument 
factor were well known beforehand. 
These experiments showed that: 

1. Where the instrument factor 


covers all variables, i.e., where they 
are not embedded in a larger matrix, 
with other media to constitute a hy- 


perplane and determine unique rota- 
tion, the typical investigator and pro- 
cedure will not find or be aware of the 
instrument factor. 

2. If the instrument factor is not 
found then either: (a) the correla- 
tions among the primaries will be dis- 
torted (if it is positive on all and they 
are all positively correlated, it will 
increase their correlations); or, (0) 
the simple structure which really ex- 
ists among the primaries will not be 
found, or found only in very im- 
paired form. Commonly 6 will pre- 
dominate, but both will operate. 

After this demonstration of the 
effect of an instrument factor in a 
single medium we proceeded to 
models and real instances containing 
blocks of variables uniformly from 
each of two or three media. Herein 
each medium was covered by one 
instrument factor but where frue per- 
sonality factors existed in the sense of 
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having a simple structure position 
with salient loadings on variables of 
similar meanings in both media. Here 
it was shown: 

1. If one obtains the best possible 
simple structure (perhaps imperfect 
because of mixed-in instrument fac- 
tor) among variables separately in 
each medium, the same simple struc- 
tures cannot usually be found when 
the media are put together. 

2. One reason for this is that if one 
projects the simple structure posi- 
tion satisfactorily obtained in one 
medium into the second,‘ it definitely 
does not give simple structure within 
the second. 

3. If, however, one first admits the 
existence of, and locates by simple 
structure in the combined matrix, the 
instrument factors (which can now 
have determinate hyperplanes), then 
the true personality factors, operat- 
ing across both media, can be located 
(in blind simple structure rotation). 
A successful example of this in real 
data—objective motivation measure- 
ment (R. B. Cattell & J. Horn, un- 
published)—is shown in Table 1 
here, and in other models elsewhere 
(Cattell, 1960). Our ignorance of 
this principle in 1948 was presum- 
ably responsible for the chaotic out- 
come of the first extensive transme- 
dium factor analyses (Cattell & Saun- 
ders, 1950, 1955). 

Incidentally, it will be obvious 
that missing the instrument factor, 
failing to rotate it correctly if one 
does not miss it, and encountering 
the subsequent distortion are due 
respectively to (a) the lack of a test 
for factor extraction that will decide, 


* This cannot be done, of course, simply by 
applying the same discovered transformation 
(A) matrix to the centroids, because the latter 
begin at different positions. One first dis- 
covers by the Procrustes program the \ most 
nearly reproducing the first medium simple 
structure from the joint medium centroid. 
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TABLE 1 
PSYCHOLOGICAL AND INSTRUMENT FACTORS AS FOUND IN OBJECTIVE, 





Dy NAMIC T RAIT SIMPLE STRUCTU! RE 





Attitude Variable and Device Measurement 





Desire for good self-control. 
Wish to know oneself, Information measure 
Wish to never to become insane. 


Feeling proud of one’s parents. 
Desire to avoid fatal disease and accidents. 
Wish to get protection from A bomb. 
Desire for good self-control. Autism measure 
Wish to know oneself. Autism measure 

Wish never to become insane. Autism measure 
Readiness to turn to parents for help. 
Feeling proud of one’s parents. Autism measure 
Desire to avoid fatal disease and accidents. 
Wish to get protection from A bomb. 


Cen Aner ane | 


= 





Note.—- The theoretically mieten salients to define the factors are boxed in, 
bottom of the parental sentiment factor column, the salients are high (above .09) where, 


theoretically required to be. 


Information measure 


Information measure 

Readiness to turn to parents for help. Information measure 
Information measure - 
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® Attitudes 13 and 14 are the same as 6 and 7, but in a different medium, and similarly, for the other cross-media 


personality factors. 


to within less than an error of two or 
three factors, how many should be 
extracted; (b) having no variables 
from other media to give a hyper- 
plane for it; and (c) the variance 
that should have been in the instru- 
ment factor being pushed into the 
personality factors, destroying the 
clarity of their hyperplanes. The 
remedy which worked in the above 
cases was to give good technical at- 
tention to these issues. 


On ISOLATING TRANSMEDIUM 
PERSONALITY FACTORS AND 
REFRACTION FACTORS 


Our final step consisted in return- 
ing to the actual L and Q data from 
which Becker infers that personality 
factors are unmatchable across me- 
dia, and showing that when exam- 
ined by more penetrating concepts, 
as above, uniquely determinate, psy- 
chologically meaningful, factor pat- 
terns appear, expressing themselves 
appropriately in both media for each 
factor. This has theoretical interest 
in giving additional substance to 
Point 3 above, by introducing the no- 


tion of refraction factors, and in pro- 
ducing some order in that L-Q fron- 
tier which has hitherto been the most 
hopelessly obscure of the transmedia 
relationships. Nevertheless, this ap- 
proach does no more than reveal 
some order, and at the same time 
opens the door on a lot of problems, 
particularly in the field of behavior 
rating, which will now demand sys- 
tematic investigations. 

It is not easy to find in any pub- 
lished study of the past 20 years 
(ever since personality structure re- 
search began in earnest) an experi- 
ment really adequate in reaching the 
technical conditions necessary to 
get anywhere on this question. One 
needs, among other things, an experi- 
ment: (a) on a sufficient sample for 
sampling errors not to be intrusive; 
(b) where the subjects had a long 
testing period in which they were 
simultaneously rated in situ.and sub- 
jected to questionnaires, comprehen- 
sive, reliable, and valid enough to 
define several factors clearly; (c) 
where ratings and questionnaire vari- 
ables were strategically chosen to 
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represent psychologically familiar 
factors, already vouched for by ear- 
lier researches; and (d) where ratings 
were carried out by peers and under 
the requisite conditions described 
elsewhere (Cattell, 1946, 1957). Prob- 
ably the most satisfactory data avail- 
able is that in which the experi- 
mental work was broadly conceived 
and painstakingly carried out by 
Coan, on 7.8-year-old children (Cat- 
tell & Coan, 1957, 1958). It suffers 
only with respect to d, in that ratings 
were made by teachers instead of 
peers, and perhaps in reduced homo- 
geneity of sample through equal in- 
clusion of boys and girls. 

Taking the data of this experiment 
we find that 24 rating variables have 
already been factored and blindly ro- 
tated into 12 very definite simple 
structure factors, each represented by 
two markers (see Table 5 in Cattell, 
1960). Similarly, 24 variables in Q 
data, each consisting of a scale of 
about eight items, have been re- 
solved as 12 well known simple struc- 
ture factors, each marked essentially 
by two salient variables. However, 
on psychological inspection of these 
resolutions, the hypothetical position 
was taken that only 10 of the 12 
factors were common to the two 
matrices, the remaining 4 being spe- 
cial, 2 to each matrix. 

The two sets of 24 variables were 
now combined and intercorrelated in 
a cross-medium, L-Q niatrix of 48 
variables, which, by Tucker’s test, 
yielded 16 factors. (With the hy- 
pothesis of matching, above, one 
would expect 14, but it is usual to 
find some new factor created by the 
mixture when two matrices are 
pooled.) The structure of this new 
factor space proved to be complex. 
Projection of simple structure ob- 
tained in one into the other, as de- 
scribed earlier (Footnote 4), would 
not yield a good combined simple 
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structure. Attempts to force simple 
structure by varimax, oblimax, or 
other “analytical” programs failed 
because these rigid programs could 
not recognize and uniquely rotate 
the instrument factors, which, on the 
basis of the above principles and 
findings, we knew must be present. 
Only a patient and comprehensive 
exploratory visual rotation (aided by 
the photographic Rotoplot program 
on Illiac), over 22 rotations, yielded 
a position of such stability that one 
could repeatedly return to it. In 
reaching this position we found that 
the hyperplanes in the data were 
noticeably a little broader (about 
+.13 instead of +.10) than. those 
existing in one medium alone. 

On examining the solution, set out 
in Table 2,5 we found that we had 
essentially an instrument factor for 
L data and another for Q data (not 
set out at the end of the matrix, but 
marked In, and Ing, in Table 2). 
There are two other factors, 
which we would guess might be pro- 
jected ‘“‘clichés,””’ numbered 13 and 
16. The interesting fact is that when 
this debris is set aside, patterns for 
the well known personality dimen- 
sions C (Ego strength), D (Excita- 
bility), F (Surgency), and H (Parmia), 
appear, with the appropriate four 
markers (2L and 2Q) on each, though 
the hyperplanes are pierced by one 
or two random appreciable loadings 
on other factors. (Counting within 


also 


+.13 they reach acceptable percent- 


5 The matrix containing the correlations 
among factors, the lambda matrix, and the cen- 
troid for Table 2 have been deposited with the 
American Documentation Institute. Order 
Document No. 6570 from ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Li- 
brary of Congress; Washington 25, D. C., re- 
mitting in advance $1.75 for microfilm or 
$2.50 for photocopies. Make checks payable 
to: Chief, Photoduplication Service, Library 
of Congress. 
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ages of 65, 73, 77, and 54 in the hy- 
perplane.) 

However, a hitherto undescribed 
phenomenon is encountered here, 
namely, the appearance of factors 
restricted to one medium, and ap- 
pearing in one or both of the separate 
media alongside, and simultaneous 
with, the appearance of the joint me- 
dium factor having the same person- 
ality meaning. This is illustrated by 
C and Cy, D and Dag (Table 2), 
wherein the real psychological factor 
(C or D), loading the four essential 
variables across both media, carries 
alongside it an incomplete image of 
itself in each medium. The incom- 
plete image loads only the two vari- 
ables which belong in one medium. 
To these patterns, occurring simul- 
taneously with the combined pat- 
tern, I have tentatively given the 
name “refraction factors,’’ since they 
are analogous to what would be seen 
if one looked at an object both di- 
rectly and refracted through a prism 
of another medium, one on each side 
of the line of vision. 

Actually Table 2 does not simul- 
taneously present all refraction fac- 
tors for all real factors, but this 
should not disturb us any more than 
the failure of a single archeological 
digging to provide all the bones of a 
skeleton or all cultural elements for 
a given period. For, as it has been 
argued elsewhere (Cattell, 1958) any 
matrix typically has strictly as many 
dimensions as variables, and prob- 
ably even more hyperplanes, i.e., one 
is always taking a selection in simple 
structure among more possible hy- 
perplanes than one has chosen to ex- 
tract factors. Further search should 
be made for refraction factors, there- 
fore. 

A vital empirical question affecting 
further inference at this point con- 
cerns the correlations among the real 
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and refraction factors for a given 
psychological dimension. We had 
expected them to be positively cor- 
related, but the best estimate from 
existing data is that they are only 
slightly correlated, if at all. It is pos- 
sible, however, that if more dimen- 
sions had been taken out their cor- 
relations would have been increased 
(see Diagram 5, Cattell, 1958). 
Exploration and evaluation of pos- 
sible hypotheses to account for re- 
fraction factors would require at 
least an article to itself. One does not 
go too far in interpretation, however, 
to say that they imply that each indi- 
vidual, in addition to his assessment 
on the real factor, gets a ‘“‘bonus’’ 
on the variables peculiar to each 
medium, which is substantially un- 
related to his status on the real 


factor. Our hypothesis is that these 
refraction factors belong to the per- 
ceptual class (Class 4 on page 166 
above) and arise from the behavior 
in question being differently per- 


ceived in the two media. In self-rat- 
ing a varying sensitivity and self- 
awareness—only in special cases a 
function of the trait being rated 
could provide the differing ‘‘bonus’’ 
from person to person. The differing 
visibilities of these individuals from 
the position of the rater, giving the 
L-data refraction, would be expected 
to be quite unrelated to the order of 
their individual sensitivities in self- 
rating. 

If this is correct one might also ex- 
pect the lesser loadings, on variables 
other than the two salients, to be 
systematically different on the two 
refraction factors. For example, the 
rating by others, in the case of a 
factor much concerned in delin- 
quency, might impart something of 
the stereotype of a scoundrel, where 
the Q-data refraction factor might 
convey more of a good person in diffi- 
culties. Since our main concern is 
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with the order which emerges little 
has been said of the “‘debris’’ factors 
notably 13 and 16in Table 2. But our 
conclusion, tentatively, is that ‘‘eval- 
uative”’ and “‘visibility”’ factors other 
than refraction factors are at present 
run together in the insufficient factor 
space so far used, and that, especially 
in the L data, these “‘halo’”’ and re- 
lated factors are substantial. They 
do not appear to be any known sec- 
ond-order factors, which can some- 
times appear in inadequate first- 
order factorings. It has sufficed for 
our present investigation simply to 
set them aside. But if closer re- 
search scrutiny in this heap shows 
that our present indications are cor- 
rect that these Class 4 perturbers are 
much larger in L than Q data, then 
the practice of trying to force ques- 
tionnaire factors to align with rating 
“criteria’’ comes still more in ques- 
tion than it is today. 

That the reader may more directly 
evaluate the nature and quality of 
the simple structure in Table 2 we 
have set out in Figure 2 a plot of two 
psychological (‘‘real’’) factors there- 
from. 


SUMMARY AMD CONCLUSION 


1. Correlations among primary per- 
sonality factors in different media do 
not provide a simple pattern of one- 
to-one relations, and fall decidedly 
short of unity between two factors 
of the same apparent psychological 
meaning. 

2. The theoretical possibilities and 
the natural occurrences of perturbing 
influences hiding true alignment have 
been discussed and demonstrated. 
They have been classified as (a) test 
instrument factors; (6) actual trait 
modification by differing experience 
in subareas, requiring unity to be 
sought at a higher order level; (c) 
differences of density of representa- 
tion of variables in different media; 
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Fic, 2. Simple structure appearing between cross-media personality 
factors. (Marker variables labeled) 


and (d) perceptual-evaluation-pro- 
jection factors, occurring where hu- 
man transmission of observations is 
involved. 

3. Experimenters, especially when 
leaving rotation decisions to falsely 
founded analytical computer pro- 
grams, commonly miss instrument 
factors, but when these are properly 
isolated and set aside by careful ex- 
periment it is possible to find the well 
known primary personality factors, 
each appearing as a single factor ex- 
pressing itself in both L and Q media. 

4. Regard for instrument and sec- 
ond-order-—first-order factor relations 
is already producing clarity and 
consistency in personality structure 
research; but much remains to be ex- 
plored regarding at least four forms 
of distortion which apparently occur 


~ 


human transmission is in- 
volved, i.e., in L and Q data. The 
new phenomenon of refraction fac- 
tors particularly calls for intensive 
research. 

5. One must distinguish between 
the question “‘Does a single simple 
structure factor exist loading varia- 
bles of the same meaning on both 
media?’’ and “‘Can one get a perfect 
correlation between estimates of ap- 
parently (by meaning) the same 
factor, made in the two media?”’ Even 
when the answer to the first, so im- 
portant for personality theory, is 
‘““Yes,’’ as this paper claims to have 
shown, the answer to the second re- 
mains ‘““No.’’ The variance due to 
instrument factors, refraction factors, 
and any evaluation-perceptual fac- 
tors peculiar to one medium will re- 


where 
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main with and confound the estimate 
of a factor from that medium. Possi- 
bilities exist, by ipsative scoring and 
discriminant function methods of im- 
proving the correlation between esti- 
mates of the same factor made in 
two different media, and a path has 
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been opened above toward a proper 
estimation of the correction for at- 
tenuation that can be applied to see 
if the correlation could be unity. 
But these developments await re- 
search. 
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COMMENTS ON CATTELL’S PAPER ON 
“PERTURBATIONS” IN PERSONALITY 
STRUCTURE RESEARCH 


WESLEY C. BECKER 
University of Illinois 


Cattell’s reply to my earlier paper 
(Becker, 1960) questioning the valid- 
ity of published statements of a one- 
to-one matching between L-data and 
Q-data factors concedes the inaccu- 
racy of those statements (see the first 
point in his summary). However, in 
the process of developing a defense 
for his basic theoretical position, Cat- 
tell has distorted the nature of my 
arguments to the point that a further 
brief clarification is needed. 

Cattell states several times in his 
paper (Cattell, 1961) that since the 
evidence did not support his theory, 
I concluded that the evidence dis- 
proved his theory. In rebuttal I 
need only quote two sentences from 


my earlier paper. 


It is apparent that the present evidence 
does not support the claim for “secure link- 
age” of BR and Q factors. This does not nec- 


essarily imply that future research using 
more reliable and factor pure measures may 
not still prove Cattell’s proposition to be cor- 
rect (p. 208). 


My critique was based on a ques- 
tion of fact, not of theory. Cattell 
has conceded this question of fact, 
as he must, but then he sets up for 
attack a question of theory which I 
did not raise. I did go on to indicate 
on logical grounds why I felt com- 
plete confirmation of his theory was 
exceedingly unlikely, and I see noth- 
ing in his present paper to change 
this opinion. The demonstration of a 
few ‘‘matchings”’ in the extraversion 
area, where on psychological grounds 
one would most expect self-percep- 
tions and behavior ratings to overlap, 
can hardly be accepted as firm evi- 
dence for his general theoretical posi- 
tion. 
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CATTELL REPLIES TO BECKER’S “COMMENTS” 


RAYMOND B. CATTELL 
University of Illinois 


In his additional comments, Becker 
expressly concedes that my theory 
has not been disproved. It is still 
odd that he objects to my saying that 
he considered the theory untrue, since 
he again says that it is “exceedingly 
unlikely,” and, to a scientist, ‘‘true”’ 
and “untrue’”’ mean “highly prob- 
able” or “highly improbable,’’—at 
least, since the time of Victorian 
physics. 

The positive conceptual and ex- 
perimental contributions of my paper 
appearing since his comments, he 
either misses or ignores, since they 
show: (a) that it was impossible for 
him to reach any intelligible conclu- 


sion on the theory without recogniz- 
ing and developing the necessary 
corrections for attenuation and per- 
turbation, and (b) that the facts 
which he says I must and do recog- 
nize are those chosen by Becker from 
experiments with older techniques. 
Science moves on, and the new facts 
which I present from technically 


more advanced designs show that the 
same factor simultaneously loads on 
the hypothesized markers for both the 
rating and the questionnaire factors. 
His statement that I concede his facts 
is therefore ambiguous. 


(Received December 26, 1960) 
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